I decided to go with a Neural Network in order to create behaviors for an animation engine that I have. The neural network takes in 3 vector3s and 1 Euler angle for every body part that I have. The first vector3 is the position, the second is its velocity, and the third is its angular velocity. The Euler angle is what rotation the body part is at. and I have 7 body parts. Each one of those data types has 3 floats. 7*4*3 = 84, so I have 84 inputs for my neural network. The outputs are mapped to the muscles of the character. They provide the amount of strength to apply to each muscle, and there are 15 of them.
I am running 15 networks simultaneously for 10 seconds, rating their fitness by calculating the lowest energy use, having the least amount of z and x movement, and if the body parts are in the correct y position compared to the rest (hips.y > upperleg.y, upperleg.y > lowerleg.y etc.), and then running them through a genetic algorithm. I was running a neural network of 168 neurons per hidden layer, with 8 hidden layers. I'm trying to get the character to stand up straight and not move around too much. I ran this for 3000 generations and I didn't even come close.
The neural network and genetic algorithm are C# versions of this tutorial. I changed the crossover method from one point to blending.
I have 84 inputs and 15 outputs. How large should my Neural Network be?
The problem you want to solve is a quite tricky one, I doubt that any "vanilla" GAs (especially that use a fixed architecture for the networks) will solve it (in a reasonable time). I also don't think that you will ever find the "right number of neurons" in the hidden layer.
However, if you are willing to spend some time on it have a look at HyperNEAT for Locomotion Control in Modular Robots which deals with more or less the same problem.
They use a quite advanced GA technique called HyperNEAT and report some good results.
HyperNEAT is built on top of NEAT (Neuroevolution of augmenting topologies). NEAT is able to evolve not only the weights of ANNs, but also their structure. It starts with simple networks and slowly makes them more complex until you reach your goal (or give up).
Then NEAT is slightly altered, to be able to use various activation functions. It will enable it to produce a wide variety of "patterns" when it's applied to a set of points e.g. in a coordinate system. The patterns can have some interesting traits, like perfect/imperfect symmetry or they can be periodic. This variant is called Compositional pattern-producing network or CPPN. A spectacular application of this technology is PicBreeder where the networks are used to "draw" pictures.
In HyperNEAT the CPPNs are used to create other ANNs. The hidden layer of the new networks is represented by a so called substrate, which can be imagined as if the neurons of the layer are put to a 2D/3D coordinate system. Then for each possible pair of the neurons (all from input layer to all hidden, from all hidden to all output) the CPPN is used to determine the weight. Thus we have an indirect encoding, which
- in itself is small
- can produce arbitrary big networks at the end
- which can also show quite complex behavior
- patterns that show up in reality/nature (again, symmetry, periodic behavior) can emerge relatively easily. Note that for animation/effective locomotion both of them are very advantageous (if not a must).
All in all it would give you a chance to solve your complex problem.
As you can see there are various layers of this technique, so implementing it for your own is not that easy. Luckily there are some good implementations of it, you can find them on the NEAT home page, together with a lot of other docs, papers and tutorials.
Interesting approach! I've been thinking about something similar for a while, would love to hear what results you get.
You'll have to test, but I'd guess you have too many hidden layers. I think this application could work with one or two at most.
You should also take a look at your fitness function - I suspect that it might be "too difficult" to provide learning, in the sense that at the beginning it has no hope of standing up. Hence the "local minimum" you fall into is learning how to fall down with the least effort. Not very useful. GAs in general suffer from local minima quite a lot.
To improve your fitness function, I'd try something like penalising deviation from upright every frame. This will give some credit to solutions that manage partial balance, so there should be an improvement path. I wouldn't bother about energy use at all until you have got them to balance.