Encog Neural Net - How to structure training data?

2019-07-08 10:37发布

问题:

Every example I've seen for Encog neural nets has involved XOR or something very simple. I have around 10,000 sentences and each word in the sentence has some type of tag. The input layer needs to take 2 inputs, the previous word and the current word. If there is no previous word, then the 1st input is not activated at all. I need to go through each sentence like this. Each word is contingent on the previous word, so I can't just have an array that looks similar to the XOR example. Furthermore, I don't really want to load all the words from 10,000+ sentences into an array, I'd rather scan one sentence at a time and once I reach EOF, start back at the beginning.

How should I go about doing this? I'm not super comfortable with Encog because all the examples I've seen have either been XOR or extremely complicated.

There are 2 inputs... Each input consists of 30 neurons. The chance of the word being a certain tag is used as inputs. So, most of the neurons get 0, the others get probability inputs like .5, .3, and .2. When I say 'aren't activated' I just mean that all the neurons are set to 0. The output layer represents all the possible tags, so, its 30. Whatever one of the output neurons has the highest number is the tag that is chosen.

I'm not sure how to go through all 10,000 sentences and look-up each word in each sentence (for the inputs and activate that input) in the 'demos' of Encog that I've seen.)

It seems that the networks are trained with a single array holding all training data, and that is looped through until the network is trained. I would like to train the network with many different arrays (an array per sentence) and then look through them all again.

This format is clearly not going to work for what I'm doing:

    do {
    train.iteration();
    System.out.println(
    "Epoch #" + epoch + " Error:" + train.getError());
    epoch++;
    } while(train.getError() > 0.01);

回答1:

So, I'm not sure how to tell you this, but that's not how a neural net works. You can't just use a word as an input, and you can't just "not activate" an input either. At a very basic level, this is what you need to run a neural network on a problem:

  1. A fixed-length input vector (whatever you are feeding in, it must be represented numerically with a fixed length. Each entry in the vector is a single number)
  2. A set of labels (each input vector must correspond to a single, fixed-length output vector)

Once you have those two, the neural net classifies an example, then edits itself to get as close as possible to the labels.

If you're looking to work with words and a deep learning framework, you should map your words to an existing vector representation (I would highly recommend glove, but word2vec is decent as well) and then learn on top of that representation.


After having a deeper understanding of what you're attempting here I think the issue is that you're dealing with 60 inputs, not one. These inputs are the concatenation of the existing predictions for both words (in the case with no first word the first 30 entries are 0). You should take care of the mapping yourself (should be very straightforward), and then just treat it as trying to predict 30 numbers with 60 numbers.

I feel obliged to tell you that the way you've framed the problem you will see awful performance. When dealing with a sparse (mostly zeros) vector and such a small dataset deep learning techniques will show VERY poor performance compared to other methods. You are better off using glove + svm or a random forest model on your existing data.



回答2:

You can use other implementations of MLDataSet besides BasicMLDataSet.

I ran into a similar problem with windows of DNA sequences. Building an array of all the windows would not have been scalable.

Instead, I implemented my own VersatileDataSource, and wrapped it in a VersatileMLDataSet.

VersatileDataSource has just a few methods to implement:

public interface VersatileDataSource {
    String[] readLine();
    void rewind();
    int columnIndex(String name);
}

For each readLine(), you could return the inputs for the previous/current word, and advance the position to the next word.