Multilayer perceptron - backpropagation

2019-04-16 11:32发布

I have a school project to program multilayer perceptron that classify data into three classes. I have implemented backpropagation algorithm from http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html. I have checked my algorithm (by manually calculating each step of backpropagation) if it really meets this explained steps and it meets.

For classifing I am using one-hot code and I have inputs consisting of vectors with 2 values and three output neurons (each for individual class). After each epoch I shuffle input data. For classification I am using sigmoid function. I tried to implement softmax too, but I haven't found how looks derivative softmax. Is derivative softmax needed in weights adjusting? For checking if network successfully classified input, I am comparing if position of an output neuron with maximal output from output neurons is corresponding to position from current input one-hot code vector that equals 1.

But my implementation doesn't train this neural network. I am working on this and debugging several days and looking on internet to find what I am doing wrong but I haven't find answer. I really don't know where I am making mistake. My neural network will successfully train when I have 10 inputs, but when I have 100, 200, 400 and 800 inputs it start cycling when it have one-half good classified inputs. As I said my backpropagation algorithm is good. Whole C++ project in Visual Studio 2010 with input files is here: http://www.st.fmph.uniba.sk/~vajda10/mlp.zip

Structures:

    struct input {
      vector<double> x;
      vector<double> cls;
    };

    struct neuron {
      double output;
      double error;
      neuron(double o, double e): output(o), error(e) { };
    };

Global variables:

    double alpha = 0.5;
    vector<vector<input>> data;

    vector<vector<neuron>> hiddenNeurons;
    vector<neuron> outputNeurons;
    vector<vector<vector<double>>> weights;

Here is my code for backpropagation algorithm:

    for (int b = 0; b < data[0].size(); b++) {
      // calculate output of hidden neurons
      for (int i = 0; i < hiddenNeurons.size(); i++) {
        for (int j = 0; j < hiddenNeurons[i].size(); j++) {
          double activation = neuronActivation(0, b, i, j);
          hiddenNeurons[i][j].output = sigmoid(activation);
        }
      }
      double partError = 0;
      // calculate output and errors on output neurons
      for (int k = 0; k < outputNeurons.size(); k++) {
        double activation = neuronActivation(0, b, hiddenNeurons.size(), k);
        outputNeurons[k].output = sigmoid(activation);
        outputNeurons[k].error = data[0][b].cls[k] - outputNeurons[k].output;
        partError += pow(outputNeurons[k].error, 2);
      }

      error += sqrt(partError)/outputNeurons.size();

      // if classification is wrong
      if (data[0][b].cls[maxOutputIndex(outputNeurons)] != 1) {
        wrongClass++;

        // error backpropagation
        for (int i = hiddenNeurons.size()-1; i >= 0; i--) {
          for (int j = 0; j < hiddenNeurons[i].size(); j++) {
            hiddenNeurons[i][j].error = 0.0;

            if (i < hiddenNeurons.size()-1) {
              for (int k = 0; k < hiddenNeurons[i+1].size(); k++) {
                hiddenNeurons[i][j].error += hiddenNeurons[i+1][k].error * weights[i+1][j][k];
              }
            }
            else {
              for (int k = 0; k < outputNeurons.size(); k++) {
                hiddenNeurons[i][j].error += outputNeurons[k].error * weights[i+1][j][k];
              }
            }
          }
        }

        // adjust weights
        for (int i = 0; i < weights.size(); i++) {
          int n;
          if (i < weights.size()-1) {
            n = hiddenNeurons[i].size();
          }
          else {
            n = outputNeurons.size();
          }

          for (int k = 0; k < n; k++) {
            for (int j = 0; j < weights[i].size(); j++) {
              double y;
              if (i == 0) {
                y = data[0][b].x[j];
              }
              else {
                y = hiddenNeurons[i-1][j].output;
              }

              if (i < weights.size()-1) {
                weights[i][j][k] += alpha * hiddenNeurons[i][k].error * derivedSigmoid(hiddenNeurons[i][k].output) * y;
              }
              else {
                weights[i][j][k] += alpha * outputNeurons[k].error * derivedSigmoid(outputNeurons[k].output) * y;
              }
            }
          }
        }
      }
    }

Please, can anyone tell me what I am doing wrong or give me an advice to where I must to look for a mistake? I hope that I have told everything important. Please, forgive me my bad english.

1条回答
别忘想泡老子
2楼-- · 2019-04-16 12:20

A Gaussian classifier (like BackpropNN) will only spline continuous exemplar sets.

Since your net learns on small sets of examplars, I assume the small set has no discontinuities.

Here for eg. is a discontinuity in a training set of examplars (input vector--->output vector):

[0,1,0,1,1,0,1,0] ---> [0,1,0]
[0,1,0,1,1,0,1,0] ---> [1,1,0]

The algo can't classify (spline) this. The output vector for a given input vector must be unique (continuous).

If you are randomly generating your exemplars, this would explain why small sets always seem to work - low probability of generating a discontinuity. Larger sets will guarantee this problem.

So, just scan and remove any problem exemplars, if this is indeed the problem. Remember that the transfer function is actually a normalizer, so real input vectors that seem different might normalize to identity.

If you are still getting stuck on local maxima or minima, try changing epsilon (learning rate). You have it hard-coded to .5 Try other values.

As a final effort, I also recommend replacing the sigmoid transfer function with a step function. The sigmoid is just a biological analog of this digital function. Remove this conversion by using digital transfer directly (step function).

Analog vs Digital transfer functions

The reason sigmoids are used in backprop is that Hinton's original work was from cognitive science, and the transfer function of a neuron is a sigmoid - the closest natural analog to a digital function.

查看更多
登录 后发表回答