I am implementing a small genetic algorithm framework - primarily for private use, unless I manage to make something reasonable at which time I will post it as open source. Right now I am focusing on selection techniques. So far I have implemented roulette wheel selection, stochastic universal sampling and tournament selection. Next on my list is rank based selection. I had a little more difficulty finding information about that than the other techniques that I've already implemented, but here is my understanding so far.
When you have your population from which you want to get reasonable parents for the next round, you first go through it and divide the
fitness of each individual by the total fitness in the population.
Then you use some other selection technique (such as roulette wheel) to actually determine whom to select for breeding.
Is this correct? If so, am I right in thinking that the rank adjustment is a sort of preprocessing step which then has to be followed by an actual selection procedure that picks out the candidates? Please correct me if I have misunderstood any of this. I am grateful for any additional pointers.
What you've described there is roulette wheel selection, not rank selection. To do rank selection, rather than weighting each candidate by its fitness score, you weight it by its "rank" (that is, best, second best, third best, etc.).
For instance, you might give the first one a weighting of 1/2, the second a weighting of 1/3, the third a weighting of 1/4, etc. Or the worst gets weighting 1, the second worst gets weighting 2, etc.
The important point is that absolute or relative fitness scores are not considered, only the rankings. So the best is more likely to be chosen than the second best, but the two have the same probabilities of being chosen whether the best had ten times the score of the second best, or only had a slightly greater score.
What have you described is simply Roulette wheel selection.In Roulette wheel selection:
- Parents are selected according to their fitness
- The better the chromosomes are, the more chances to be selected they have.
Imagine a roulette wheel where all chromosomes in the population are placed, each chromosome has its place big accordingly to its fitness function, like on the following picture.
This selection will have problems when the finesses differs very much.
Outstanding individuals will introduce a bias in the beginning of the search that may cause a premature convergence and a loss of diversity.
For Example:
if an initial population contains one or two very fit but not the best
individuals and the rest of the population are not good, then these
fit individuals will quickly dominate the whole population and prevent
the population from exploring other potentially better individuals.
Such a strong domination causes a very high loss of genetic diversity
which is definitely not advantageous for the optimization process.
But in Rank Selection:
- Rank selection first ranks the population and then every chromosome receives fitness from this ranking.
The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population).
After this all the chromosomes have a chance to be selected.
- Rank-based selection schemes can avoid premature convergence.
- But can be computationally expensive because it sorts the populations based on fitness value.
- But this method can lead to slower convergence, because the best chromosomes do not differ so much from other ones.
So the process will be:
First sort the Fitness value of the Population.
Then if the Population number is 10 then give the probability of selection to the Population like 0.1,0.2,0.3,...,1.0 .
- Then calculate cumulative Fitness and make roulette wheel.
- And the next steps is same as roulette wheel.
My Implementation of Rank Selection in Matlab:
NewFitness=sort(Fitness);
NewPop=round(rand(PopLength,IndLength));
for i=1:PopLength
for j=1:PopLength
if(NewFitness(i)==Fitness(j))
NewPop(i,1:IndLength)=CurrentPop(j,1:IndLength);
break;
end
end
end
CurrentPop=NewPop;
ProbSelection=zeros(PopLength,1);
CumProb=zeros(PopLength,1);
for i=1:PopLength
ProbSelection(i)=i/PopLength;
if i==1
CumProb(i)=ProbSelection(i);
else
CumProb(i)=CumProb(i-1)+ProbSelection(i);
end
end
SelectInd=rand(PopLength,1);
for i=1:PopLength
flag=0;
for j=1:PopLength
if(CumProb(j)<SelectInd(i) && CumProb(j+1)>=SelectInd(i))
SelectedPop(i,1:IndLength)=CurrentPop(j+1,1:IndLength);
flag=1;
break;
end
end
if(flag==0)
SelectedPop(i,1:IndLength)=CurrentPop(1,1:IndLength);
end
end
Note: you can also see my question regarding rank selection here in this link and my article here.
I was also a bit confused by various sources on how the probabilities are being computed when using the Linear Ranking Selection, sometimes also called "Rank Selection" as is referred to here. At least I hope these two are referring to the same thing.
The part which was elusive to me is the sum of the ranks which seems to have been omitted or at least not explicitly stated in most of the sources. Here I present a short, yet a verbose Python example of how the probability distribution is computed (those nice charts you often see).
Assuming these are some example individual fintesses: 10, 9, 3, 15, 85, 7.
Once sorted, assign the ranks in ascending order: 1st: 3, 2nd: 7, 3rd: 9, 4th: 10, 5th: 15, 6th: 85
The sum of all ranks is 1+2+3+4+5+6 or using the gauss formula (6+1)*6/2 = 21.
Hence, we compute the probabilities as: 1/21, 2/21, 3/21, 4/21, 5/21, 6/21, which you can then express as percentages:
Take note that this is not what is used in actual implementations of genetic algorithms, only a helper script to give you better intuition.
You can fetch this script with:
curl -o ranksel.py https://gist.githubusercontent.com/kburnik/3fe766b65f7f7427d3423d233d02cd39/raw/5c2e569189eca48212c34b3ea8a8328cb8d07ea5/ranksel.py
#!/usr/bin/env python
"""
Assumed name of script: ranksel.py
Sample program to estimate individual's selection probability using the Linear
Ranking Selection algorithm - a selection method in the field of Genetic
Algorithms. This should work with Python 2.7 and 3.5+.
Usage:
./ranksel.py f1 f2 ... fN
Where fK is the scalar fitness of the Kth individual. Any ordering is accepted.
Example:
$ python -u ranksel.py 10 9 3 15 85 7
Rank Fitness Sel.prob.
1 3.00 4.76%
2 7.00 9.52%
3 9.00 14.29%
4 10.00 19.05%
5 15.00 23.81%
6 85.00 28.57%
"""
from __future__ import print_function
import sys
def compute_sel_prob(population_fitness):
"""Computes and generates tuples of (rank, individual_fitness,
selection_probability) for each individual's fitness, using the Linear
Ranking Selection algorithm."""
# Get the number of individuals in the population.
n = len(population_fitness)
# Use the gauss formula to get the sum of all ranks (sum of integers 1 to N).
rank_sum = n * (n + 1) / 2
# Sort and go through all individual fitnesses; enumerate ranks from 1.
for rank, ind_fitness in enumerate(sorted(population_fitness), 1):
yield rank, ind_fitness, float(rank) / rank_sum
if __name__ == "__main__":
# Read the fitnesses from the command line arguments.
population_fitness = list(map(float, sys.argv[1:]))
print ("Rank Fitness Sel.prob.")
# Iterate through the computed tuples and print the table rows.
for rank, ind_fitness, sel_prob in compute_sel_prob(population_fitness):
print("%4d %7.2f %8.2f%%" % (rank, ind_fitness, sel_prob * 100))