I'm working on creating a 2 layer neural network with back-propagation. The NN is supposed to get its data from a 20001x17 vector that holds following information in each row:
-The first 16 cells hold integers ranging from 0 to 15 which act as variables to help us determine which one of the 26 letters of the alphabet we mean to express when seeing those variables. For example a series of 16 values as follows are meant to represent the letter A: [2 8 4 5 2 7 5 3 1 6 0 8 2 7 2 7].
-The 17th cell holds a number ranging from 1 to 26 representing the letter of the alphabet we want. 1 stands for A, 2 stands for B etc.
The output layer of the NN consists of 26 outputs. Every time the NN is fed an input like the one described above it's supposed to output a 1x26 vector containing zeros in all but the one cell that corresponds to the letter that the input values were meant to represent. for example the output [1 0 0 ... 0] would be letter A, whereas [0 0 0 ... 1] would be the letter Z.
Some things that are important before i present the code: I need to use the traingdm function and the hidden layer number is fixed (for now) at 21.
Trying to create the above concept i wrote the following matlab code:
%%%%%%%%
%Start of code%
%%%%%%%%
%
%Initialize the input and target vectors
%
p = zeros(16,20001);
t = zeros(26,20001);
%
%Fill the input and training vectors from the dataset provided
%
for i=2:20001
for k=1:16
p(k,i-1) = data(i,k);
end
t(data(i,17),i-1) = 1;
end
net = newff(minmax(p),[21 26],{'logsig' 'logsig'},'traingdm');
y1 = sim(net,p);
net.trainParam.epochs = 200;
net.trainParam.show = 1;
net.trainParam.goal = 0.1;
net.trainParam.lr = 0.8;
net.trainParam.mc = 0.2;
net.divideFcn = 'dividerand';
net.divideParam.trainRatio = 0.7;
net.divideParam.testRatio = 0.2;
net.divideParam.valRatio = 0.1;
%[pn,ps] = mapminmax(p);
%[tn,ts] = mapminmax(t);
net = init(net);
[net,tr] = train(net,p,t);
y2 = sim(net,pn);
%%%%%%%%
%End of code%
%%%%%%%%
Now to my problem: I want my outputs to be as described, namely each column of the y2 vector for example should be a representation of a letter. My code doesn't do that though. Instead it produced results that vary greatly between 0 and 1, values from 0.1 to 0.9.
My question is: is there some conversion i need to be doing that i am not? Meaning, do i have to convert my input and/or output data to a form by which i can actually see if my NN is learning correctly?
Any input would be appreciated.
I don't know if this constitutes an actual answer or not: but here are some remarks.
See David Mackay's textbook for a nice intro to neural nets which will make clear the probabilistic connection. Take a look at this paper from Geoff Hinton's group which describes the task of predicting the next character given the context for details on the correct representation and activation/activity functions (although beware their method is non-trivial and uses a recurrent net with a different training method).
This is normal. Your output layer is using a log-sigmoid transfer function, and that will always give you some intermediate output between 0 and 1.
What you would usually do would be to look for the output with the largest value -- in other words, the most likely character.
This would mean that, for every column in
y2
, you're looking for the index of the row that contains the largest value in that row. You can compute this as follows:I
is then a vector containing the indexes of the largest value in each row.hardlin fcn
in output layer.trainlm
ortrainrp
for training the network.mapminmax
for pre-processing data set.You can think of y2 as an output probability distribution for each input being one of the 26 alphabet characters, for example if one column of y2 says:
then its 50% probability that this character is B (if we assume only 4 possible outputs).
==REMARK==
It is preferable to avoid using target values of 0,1 to encode the output of the network.
The reason for avoiding target values of 0 and 1 is that 'logsig' sigmoid transfer function cannot produce these output values given finite weights. If you attempt to train the network to fit target values of exactly 0 and 1, gradient descent will force the weights to grow without bound.
So instead of 0 and 1 values, try using values of 0.04 and 0.9 for example, so that [0.9,0.04,...,0.04] is the target output vector for the letter A.
Reference:
Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997, p114-115