Based on PyBrain's tutorials I managed to knock together the following code:
#!/usr/bin/env python2
# coding: utf-8
from pybrain.structure import FeedForwardNetwork, LinearLayer, SigmoidLayer, FullConnection
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
n = FeedForwardNetwork()
inLayer = LinearLayer(2)
hiddenLayer = SigmoidLayer(3)
outLayer = LinearLayer(1)
n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)
n.sortModules()
ds = SupervisedDataSet(2, 1)
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
trainer = BackpropTrainer(n, ds)
# trainer.train()
trainer.trainUntilConvergence()
print n.activate([0, 0])[0]
print n.activate([0, 1])[0]
print n.activate([1, 0])[0]
print n.activate([1, 1])[0]
It's supposed to learn XOR function, but the results seem quite random:
0.208884929522
0.168926515771
0.459452834043
0.424209192223
or
0.84956138664
0.888512762786
0.564964077401
0.611111147862
There are four problems with your approach, all easy to identify after reading Neural Network FAQ:
Why use a bias/threshold?: you should add a bias node. Lack of bias makes the learning very limited: the separating hyperplane represented by the network can only pass through the origin. With the bias node, it can move freely and fit the data better:
bias = BiasUnit()
n.addModule(bias)
bias_to_hidden = FullConnection(bias, hiddenLayer)
n.addConnection(bias_to_hidden)
Why not code binary inputs as 0 and 1?: all your samples lay in a single quadrant of the sample space. Move them to be scattered around the origin:
ds = SupervisedDataSet(2, 1)
ds.addSample((-1, -1), (0,))
ds.addSample((-1, 1), (1,))
ds.addSample((1, -1), (1,))
ds.addSample((1, 1), (0,))
(Fix the validation code at the end of your script accordingly.)
trainUntilConvergence
method works using validation, and does something that resembles the early stopping method. This doesn't make sense for such a small dataset. Use trainEpochs
instead. 1000
epochs is more than enough for this problem for the network to learn:
trainer.trainEpochs(1000)
What learning rate should be used for backprop?: Tune the learning rate parameter. This is something you do every time you employ a neural network. In this case, the value 0.1
or even 0.2
dramatically increases the learning speed:
trainer = BackpropTrainer(n, dataset=ds, learningrate=0.1, verbose=True)
(Note the verbose=True
parameter. Observing how the error behaves is essential when tuning parameters.)
With these fixes I get consistent, and correct results for the given network with the given dataset, and error less than 1e-23
.