Neural Network Error oscillating with each trainin

I've implemented a back-propagating neural network and trained it on my data. The data alternates between sentences in English & Africaans. The neural network is supposed to identify the language of the input.

The structure of the Network is 27 *16 * 2 The input layer has 26 inputs for each letter of the alphabet plus a bias unit.

My problem is that the error is thrown violently in opposite directions as each new training example is encountered. As I mentioned, the training examples are read in, in alternating fashion (English, Africaans, English....)

I can train the network to identify all English, or all Africaans, but not to identify either one (both) in the same pass.

The y-axis below is the Output signal error for each of the two output nodes (English and Africaans), and the x axis is the number of the training example. In a way, it does exactly what I programmed it to do; when the example is English it the changes the weights to identify English better. However, in doing so, it makes the network worse at predicting Africaans. This is why the error goes between positive and negative values.

Clearly this isn't how it should work, but I'm stuck.

I feel as though the error is conceptual on my part, but here is the relevant code:

public void train() throws NumberFormatException, IOException{

    // Training Accuracy
    double at = 0;

    //epoch
    int epoch = 0;

    int tNum = 0;

    for(; epoch < epochMax; epoch++){

        // Reads stock files from TestPackage package in existing project
        BufferedReader br = new BufferedReader(new InputStreamReader(this.getClass().
                getResourceAsStream("/TrainingData/" + trainingData.getName())));

        while ((line = br.readLine()) != null) {

            Boolean classified = false;

            tNum++;

            // Set the correct classification Tk
            t[0] = Integer.parseInt(line.split("\t")[0]); //Africaans
            t[1] = (t[0] == 0) ? 1 : 0; // English


            // Convert training string to char array
            char trainingLine[] = line.split("\t")[1].toLowerCase().toCharArray();


            // Increment idx of input layer z, that matches
            // the position of the char in the alphabet
            // a == 0, b == 2, etc.....
            for(int l = 0; l < trainingLine.length; l++){
                if((int)trainingLine[l] >= 97 && (int)trainingLine[l] <= 122)
                    z[(int)trainingLine[l] % 97]++;
            }


            /*System.out.println("Z   " + Arrays.toString(z));
            System.out.println();*/

            // Scale Z
            for(int i = 0; i < z.length-1; i++){
                z[i] = scale(z[i], 0, trainingLine.length, -Math.sqrt(3),Math.sqrt(3));
            }

         /*----------------------------------------------------------------
          *                  SET NET HIDDEN LAYER 
          * Each ith unit of the hidden Layer = 
          * each ith unit of the input layer
          * multiplied by every j in the ith level of the weights matrix ij*/


            for(int j = 0; j < ij.length; j++){  // 3
                double[] dotProduct = multiplyVectors(z, ij[j]);
                y[j] = sumVector(dotProduct);   

            }


            /*----------------------------------------------------------------
             *                 SET ACTIVATION HIDDEN LAYER 
             */

            for(int j = 0; j < y.length-1; j++){
                y[j] = sigmoid(y[j], .3, .7);
            }

            /*----------------------------------------------------------------
             *                       SET NET OUTPUT LAYER 
             * Each jth unit of the hidden Layer = 
             * each jth unit of the input layer
             * multiplied by every k in the jth level of the weights matrix jk*/


            for(int k = 0; k < jk.length; k++){  // 3
                double[] dotProduct = multiplyVectors(y, jk[k]);
                o[k] = sumVector(dotProduct);
            }

            /*----------------------------------------------------------------
             *                   SET ACTIVATION OUTPUT LAYER
             */

            for(int k = 0; k < o.length; k++){
                o[k] = sigmoid(o[k], .3, .7);
            }

            /*----------------------------------------------------------------
             *                     SET OUTPUT ERROR
             * For each traing example, evalute the error.
             * Error is defined as (Tk - Ok)
             * Correct classifications will result in zero error:
             *          (1 - 1) = 0
             *          (0 - 0) = 0
             */

            for(int k = 0; k < o.length; k++){
                oError[k] = t[k] - o[k];
            }

            /*----------------------------------------------------------------
             *                     SET TRAINING ACCURACY
             * If error is 0, then a 1 indicates a succesful prediction.
             * If error is 1, then a 0 indicates an unsucessful prediction.
             */

            if(quantize(o[0],.3, .7) == t[0] && quantize(o[1], .3, .7) == t[1]){
                classified = true;
                at += 1;
            }


            // Only compute errors and change weiths for classification errors
            if(classified){
                continue;
            }

            /*----------------------------------------------------------------
             *                  CALCULATE OUTPUT SIGNAL ERROR
             *                 Error of ok = -(tk - ok)(1 - ok)ok
             */


            for(int k = 0; k < o.length; k++){
                oError[k] = outputError(t[k], o[k]);

            }

            /*----------------------------------------------------------------
             *                  CALCULATE HIDDEN LAYER SIGNAL ERROR
             *                  
             */

            // The term (1-yk)yk is expanded to yk - yk squared

            // For each k-th output unit, multiply it by the
            // summed dot product of the two terms (1-yk)yk and jk[k]


            for(int j = 0; j < y.length; j++){
                for(int k = 0; k < o.length; k++){
                    /*System.out.println(j+"-"+k);*/
                    yError[j] +=  oError[k] * jk[k][j] * (1 -  y[j]) * y[j];

                }
            }   
            /*----------------------------------------------------------------
             *                  CALCULATE NEW WIGHTS FOR HIDDEN-JK-OUTPUT
             *                  
             */

            for(int k = 0; k < o.length; k++){
                for(int j = 0; j < y.length; j++){
                    djk[k][j] = (-1*learningRate)*oError[k]*y[j] + momentum*djk[k][j];

                    // Old weights = themselves + new delta weight
                    jk[k][j] += djk[k][j]; 

                }
            }

            /*----------------------------------------------------------------
             *         CALCULATE NEW WIGHTS FOR INPUT-IJ-HIDDEN
             *                  
             */

            for(int j = 0; j < y.length-1; j++){
                for(int i = 0; i < z.length; i++){

                    dij[j][i] = (-1*learningRate)*yError[j]*z[i] + momentum*dij[j][i];

                    // Old weights = themselves + new delta weight
                    ij[j][i] += dij[j][i]; 

                }
            }
        }
    }
    // Accuracy Percentage
    double at_prec = (at/tNum) * 100;

    System.out.println("Training Accuracy: " + at_prec);    
}

I agree with the comments that this model is probably not the best for your classification problem but if you are interested in trying to get this to work I will give you the reason I think this oscillates and the way that I would try and tackle this problem.

From my understanding of your question and comments, I cannot understand what the network actually “learns” in this instance. You feed letters in (is this the number of times that the letter occurs in the sentence?) and you force it to map to an output. Let’s say you just use English now and English corresponds to an output of 1. So you “train” it on one sentence and for argument’s sake it chooses the letter “a” as the determining input, which is quite a common letter. It sets the network weight such that when it sees “a” the output is 1, and all other letter inputs get weighted down such that they don’t influence the output. It might not be so black and white, but it could be doing something very similar. Now, every time you feed another English sentence in, it only has to see an “a” to give a correct output. Doing the same for just Africaans as an output of zero, it maps “a” to zero. So, every time you alternate between the two languages, it completely reassigns the weightings... you’re not building on a structure. The back-propagation of error is basically always a fixed value because there are no degrees of rightness or wrongness, it’s one or the other. So I would expect it to oscillate exactly as you are seeing.

EDIT: I think this boils down to something like the presence of letters being used to classify the language category and expecting one of two polar outputs, rather than anything about the relationship between the letters that defines the language.

On a conceptual level, I would have a full pre-processing stage to get some statistics. Off the top of my head, I might calculate (I don’t know the language): - The ratio of letter “a” to “c” occuring in sentence - The ratio of letter “d” to “p” occuring in sentence - The average length of word in the sentence

Do this for 50 sentences of each language. Feed all the data in at once and train on the whole set (70% for training, 15% for validation, 15% for testing). You cannot train a network on a single value each time (as I think you are doing?), it needs to see the whole picture. Now your output is not so black and white, it has the flexibility to map to a value that is between 0 and 1, not absolutes each time. Anything above 0.5 is English, below 0.5 is Africaans. Start with, say, 10 statistical parameters for the languages, 5 neurons in hidden layer, 1 neuron in the output layer.