可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
It's really all in the title, but here's a breakdown for anyone who is interested in Evolutionary Algorithms:
In an EA, the basic premise is that you randomly generate a certain number of organisms (which are really just sets of parameters), run them against a problem, and then let the top performers survive.
You then repopulate with a combination of crossbreeds of the survivors, mutations of the survivors, and also a certain number of new random organisms.
Do that several thousand times, and efficient organisms arise.
Some people also do things like introduce multiple "islands" of organisms, which are seperate populations that are allowed to crossbreed once in awhile.
So, my question is: what are the optimal repopulation percentages?
I have been keeping the top 10% performers, and repopulating with 30% crossbreeds and 30% mutations. The remaining 30% is for new organisms.
I have also tried out the multiple island theory, and I'm interested in your results on that as well.
It is not lost on me that this is exactly the type of problem an EA could solve. Are you aware of anyone trying that?
Thanks in advance!
回答1:
I initially tried to model what I thought organic systems were like. Ultimately decided that was no good, and went more aggressive, with 10% kept, 20% mutated, 60% crossbred, and 10% random.
Then I noticed my top 10% were all roughly identical. So I increased the random to 30%. That helped some, but not much.
I did try multiple island, and generation-skipping, and reseeding, which gave better results, but still highly unsatisfactory, very little variation in the top 10%, crazy-long numbers of generations to get any results. Mostly the code learned how to hack my fitness evaluation.
It's really easy to get top performers, so don't worry about keeping too many of them around. Crossbreeds help to pare down positive and negative traits, so they're useful, but really what you want to get is a lot of good random bred in. Focus on mutations and new randoms to bring in features, and let the crossbreeds and top performers just keep track of the best and refine them more slowly. IE: stuff based on the last generation is just finding a better local maxima, randoms find better global maxima.
I still believe optimal answers to your question can be found by observing natural phenomena, such as in a recent article regarding randomness of fruit-fly flight paths, so that may pan out.
Probably the best answer is to just run it and tweak it, don't be afraid to tweak it pretty heavily, the populations are robust. Make sure you implement a way to save and continue.
回答2:
The best resources I've come across for GA and EA were John Koza's books on Genetic Programming. He covers the topic in depth - techniques for encoding the genome, random mutation, breeding, tuning the fitness function.
Personally I've only written a small handful of simulators for pedagogical purposes. What I found was that how I tuned those percentages was related to the particulars of the fitness function I was using, how much random mutation I had introduced and how 'smart' I had tried to make the mutation and breeding - I found that the less 'smart' I tried to make the mutator and the cross-over logic, the faster the population improved its fitness score - I also found that I had been too conservative in the probability of mutation -- my initial runs hit local maxima and had a hard time getting out of them.
None of this gives you concrete answers, but I don't think there are concrete answers, GA is unpredictable by its nature and tuning those kinds of parameters may still be a bit of an art. Of course you could always try a meta-GA, using those parameters as a chromosome, searching for settings that produce a more rapid fitness in the base GA you're running.
Depends on how 'meta' you want to get.
回答3:
This is a hotly-debated (in the literature and Melanie, et al books) topic that seems to be very domain-specific. What works for one problem of one type with n parameters will almost never work for another problem, another domain, or another parametric set.
So, as TraumaPony suggested, tweak it yourself for each problem you are solving or write something to optimize it for you. The best thing you can do is keep track of all of your "knob-twiddling" and fine-tuning experiments so you can map out the solution terrain and get a feel for how to optimize within that space quickly. Also try alternative techniques like hill-climbing so you can have a baseline to beat.
@Kyle Burton: crossover vs. mutation rates are also constantly debated in each class of problems handed over to GAs and GPs.
回答4:
Assuming you have a method for quantifying the top X% percent performers, I'd suggest that instead of using a hard coded threshold you analyze the performance distribution and make your cutoff somewhere in the range of the first major drop in performance, and then tuning your crossbreads, mutations, and new organisms to fill in the gaps. This way if you have a very "productive" run in which lots of variations were successful you don't throw a significant number of high performers. Also, if you have an "unproductive" run you can scrap more of the existing organisms in favor of more newer organisms that should be taking their place.
回答5:
I have had some success increasing the diversity of population by setting mutation and crossover from a couple of the genes from the parent chromosomes.
This works until the mutation rate drops to zero; since it is likely that there will be a periodic evolutionary pressure to do this, you should try and make sure these genes have a minimum rate.
In practice, I opted for a multi-chromosome genotype. One chromosome coded for the other's reproductive function. The smaller 'reproduction chromosome' had a sensible fixed rates for mutation and crossover.
I found that this would stop the classic plateau and convergence of the population.
As an aside, I tend to do both crossover and mutation for each child.
For generational GAs, I try to shun elitism altogether, but where populating from multiple islands, I keep the top elite from each island. When the islands come together, then the elites can all breed together.
回答6:
There would appear to be a few answers suggesting using a 2nd GA to determine optimum parameters for the 1st GA, with no mention of how to determine the optimum parameters for the 2nd. I can't help but wonder about the religious beliefs of those suggesting this approach...
回答7:
As others have mentioned, the optimal mix would depend on your specific problem and other problem-specific factors such as the size of the solution space.
Before we discuss the evolution breakdown from one generation to the next, it's important to consider the size of each generation. Generally my approach is to start with a fairly large population (~100k-500k individuals) of fairly diverse individuals, which is something that Koza suggests in some of his work. To obtain this diversity from the start, you could divide up your solution space into buckets, and then ensure that at least a certain number of individuals falls into each bucket. (E.G. if you have a tree representation for each individual, ensure that equal amounts are created of depth 2, 3, ..., max_depth)
As far as your actual question, there's no clear way to approach it, but depending on your problem, you may want to emphasize randomness or de-emphasize it. When you want to emphasize it, you should keep less indivuduals intact, and introduce a higher number of new random individuals. You would generally like to do this if there are many local maximums in your solution space and you want to have a broader search.
When you obtain a break down there are a few things to consider... for one, duplication (a lot of identical or newly identical individuals at the top inbreeding). To reduce this you may want to sweep through your population between generations and replace duplications with new random individuals or crossbred ones.
That said, my current approach is to keep the top 1%, crossbreed the top 20% into a new 20%, crossbreed the top 40% into the next 20%, crossbreed the top 90% to generate the next 20%, and randomly generate the rest (39%). If there are duplicates, I remove them and replace them with new random individuals.
I don't use mutations because the high number of random individuals should take care of adding in "mutations" during the following crossbreeding.
回答8:
You know what you could do... You could write a genetic algorithm to determine that optimal distribution.
But, usually I keep the top 12%, and 28% crossbreeds; with 30% each for the others.