I've implemented a back-propagating neural network and trained it on my data. The data alternates between sentences in English & Africaans. The neural network is supposed to identify the language of the input.
The structure of the Network is 27 *16 * 2 The input layer has 26 inputs for each letter of the alphabet plus a bias unit.
My problem is that the error is thrown violently in opposite directions as each new training example is encountered. As I mentioned, the training examples are read in, in alternating fashion (English, Africaans, English....)
I can train the network to identify all English, or all Africaans, but not to identify either one (both) in the same pass.
The y-axis below is the Output signal error for each of the two output nodes (English and Africaans), and the x axis is the number of the training example. In a way, it does exactly what I programmed it to do; when the example is English it the changes the weights to identify English better. However, in doing so, it makes the network worse at predicting Africaans. This is why the error goes between positive and negative values.
Clearly this isn't how it should work, but I'm stuck.
I feel as though the error is conceptual on my part, but here is the relevant code:
public void train() throws NumberFormatException, IOException{
// Training Accuracy
double at = 0;
//epoch
int epoch = 0;
int tNum = 0;
for(; epoch < epochMax; epoch++){
// Reads stock files from TestPackage package in existing project
BufferedReader br = new BufferedReader(new InputStreamReader(this.getClass().
getResourceAsStream("/TrainingData/" + trainingData.getName())));
while ((line = br.readLine()) != null) {
Boolean classified = false;
tNum++;
// Set the correct classification Tk
t[0] = Integer.parseInt(line.split("\t")[0]); //Africaans
t[1] = (t[0] == 0) ? 1 : 0; // English
// Convert training string to char array
char trainingLine[] = line.split("\t")[1].toLowerCase().toCharArray();
// Increment idx of input layer z, that matches
// the position of the char in the alphabet
// a == 0, b == 2, etc.....
for(int l = 0; l < trainingLine.length; l++){
if((int)trainingLine[l] >= 97 && (int)trainingLine[l] <= 122)
z[(int)trainingLine[l] % 97]++;
}
/*System.out.println("Z " + Arrays.toString(z));
System.out.println();*/
// Scale Z
for(int i = 0; i < z.length-1; i++){
z[i] = scale(z[i], 0, trainingLine.length, -Math.sqrt(3),Math.sqrt(3));
}
/*----------------------------------------------------------------
* SET NET HIDDEN LAYER
* Each ith unit of the hidden Layer =
* each ith unit of the input layer
* multiplied by every j in the ith level of the weights matrix ij*/
for(int j = 0; j < ij.length; j++){ // 3
double[] dotProduct = multiplyVectors(z, ij[j]);
y[j] = sumVector(dotProduct);
}
/*----------------------------------------------------------------
* SET ACTIVATION HIDDEN LAYER
*/
for(int j = 0; j < y.length-1; j++){
y[j] = sigmoid(y[j], .3, .7);
}
/*----------------------------------------------------------------
* SET NET OUTPUT LAYER
* Each jth unit of the hidden Layer =
* each jth unit of the input layer
* multiplied by every k in the jth level of the weights matrix jk*/
for(int k = 0; k < jk.length; k++){ // 3
double[] dotProduct = multiplyVectors(y, jk[k]);
o[k] = sumVector(dotProduct);
}
/*----------------------------------------------------------------
* SET ACTIVATION OUTPUT LAYER
*/
for(int k = 0; k < o.length; k++){
o[k] = sigmoid(o[k], .3, .7);
}
/*----------------------------------------------------------------
* SET OUTPUT ERROR
* For each traing example, evalute the error.
* Error is defined as (Tk - Ok)
* Correct classifications will result in zero error:
* (1 - 1) = 0
* (0 - 0) = 0
*/
for(int k = 0; k < o.length; k++){
oError[k] = t[k] - o[k];
}
/*----------------------------------------------------------------
* SET TRAINING ACCURACY
* If error is 0, then a 1 indicates a succesful prediction.
* If error is 1, then a 0 indicates an unsucessful prediction.
*/
if(quantize(o[0],.3, .7) == t[0] && quantize(o[1], .3, .7) == t[1]){
classified = true;
at += 1;
}
// Only compute errors and change weiths for classification errors
if(classified){
continue;
}
/*----------------------------------------------------------------
* CALCULATE OUTPUT SIGNAL ERROR
* Error of ok = -(tk - ok)(1 - ok)ok
*/
for(int k = 0; k < o.length; k++){
oError[k] = outputError(t[k], o[k]);
}
/*----------------------------------------------------------------
* CALCULATE HIDDEN LAYER SIGNAL ERROR
*
*/
// The term (1-yk)yk is expanded to yk - yk squared
// For each k-th output unit, multiply it by the
// summed dot product of the two terms (1-yk)yk and jk[k]
for(int j = 0; j < y.length; j++){
for(int k = 0; k < o.length; k++){
/*System.out.println(j+"-"+k);*/
yError[j] += oError[k] * jk[k][j] * (1 - y[j]) * y[j];
}
}
/*----------------------------------------------------------------
* CALCULATE NEW WIGHTS FOR HIDDEN-JK-OUTPUT
*
*/
for(int k = 0; k < o.length; k++){
for(int j = 0; j < y.length; j++){
djk[k][j] = (-1*learningRate)*oError[k]*y[j] + momentum*djk[k][j];
// Old weights = themselves + new delta weight
jk[k][j] += djk[k][j];
}
}
/*----------------------------------------------------------------
* CALCULATE NEW WIGHTS FOR INPUT-IJ-HIDDEN
*
*/
for(int j = 0; j < y.length-1; j++){
for(int i = 0; i < z.length; i++){
dij[j][i] = (-1*learningRate)*yError[j]*z[i] + momentum*dij[j][i];
// Old weights = themselves + new delta weight
ij[j][i] += dij[j][i];
}
}
}
}
// Accuracy Percentage
double at_prec = (at/tNum) * 100;
System.out.println("Training Accuracy: " + at_prec);
}