I am having trouble generating random letters based on probability.
For example, the letters J, K, Q, Y, Z each have a probability of 1/96 of occurring. A similar process (with higher probabilities) is used for other letters.
Can somebody show me how to do this?
Edit to be specific: I'm writing a method called "getRandomLetter" that returns a char of a random letter based on a probability fraction.
The most eastetically pleasing solution would require a compact container for given letter's occurrence probabilities. I'd suggest using a HashMap that would serve as a probability function (discrete distribution function). Like this:
For sake it would be good to make sure, that overall sum of all probabilities is equal to
1.0
, but the numbers can be treated as probability weigths and normalized at the end. You get the idea, right?A purely mathematical aproach would require creating a cumulative distribution function, reverting it and then explicite using of that function. That way you could provide a solution of generating any random values with pretty much any probability distribution.
Let's try doing it at once:
Now to use the map just call
The typical way to select from a discrete set of elements with specific probabilities is to choose a random floating-point number and find out which range it lies in. I'll explain with an example. Suppose that you're choosing among three letters, A, B, and C, with probabilities 0.255, 0.407, and 0.338 respectively. You would compute a random number between 0 and 1
and first compare it to the range from 0 to 0.255:
then to the range from 0.255 to (0.255 + 0.407):
and if it's not either of those, it has to be
'C'
:If you're doing this with all 26 letters of the alphabet, it will be a pain to write out all 26 cases of the
if
-else
statement. What you could do in advance is prepare an array of the characters and their respective probabilities,and then you can automate all that
if
-ing with a loop like this:In your case, if all your probabilities are multiples of 1/96, then you can do the same thing choosing a random integer less than 96 instead of a floating-point number. Just use
int
s instead ofdouble
s, and usernd.nextInt(96)
to choose an integer between 0 and 95, inclusive, instead ofMath.random()
. Also, yourprobabilities
array would contain the actual probability times 96.Now, if you're doing something like drawing Scrabble tiles out of a bag, then it becomes trickier because that is a sampling process without replacement, i.e. the probabilities change after every draw. I think a better method in that case would be to actually use a collection to simulate the bag, and then actually add one copy of the letter for each tile that has that letter on it. You can still do this in a loop using the same
chars
andprobabilities
arrays from before:Then you can
bag.shuffle()
to randomize the tiles, andbag.pop()
lets you pick one at random.Here's some documentation on generating random numbers in java.
Now, let's say you generate a random integer between 0 and 95 inclusive(96 possible variants)
you can then map each of your letters to one of those numbers. a simple and dirty way to do it would be a switch statement
another simple way to do it would be to use an array of chars.
http://www.asciitable.com/
What you could do is something like this:
make an array or list of these dictionaries and foreach them
then,
the more times you want it to be likely to be picked, the more times you add it in the list.
ALSO: you can replace chars with strings of one length if you want.
EDIT: here's the old way to add to letters:
etc.