I think I've settled on this as the most simple and unit-testable method for randomising a list, but would be interested to hear of any improvements.
public static IList<T> RandomiseList<T>(IList<T> list, int seed)
{
Random random = new Random(seed);
List<T> takeFrom = new List<T>(list);
List<T> ret = new List<T>(takeFrom.Count);
while (takeFrom.Count > 0)
{
int pos = random.Next(0, takeFrom.Count - 1);
T item = takeFrom[pos];
takeFrom.RemoveAt(pos);
ret.Add(item);
}
return ret;
}
You want a shuffle, and the best way to do that is the Fisher-Yates shuffle:
public static IList<T> Randomise<T>(IList<T> list, int seed)
{
Random rng = new Random(seed);
List<T> ret = new List<T>(list);
int n = ret.Length;
while (n > 1)
{
n--;
int k = rng.Next(n + 1);
// Simple swap of variables
T tmp = list[k];
ret[k] = ret[n];
ret[n] = tmp;
}
return ret;
}
I liked Dennis Palmers idea of returning a shuffled IEnumerable instead of shuffle the list in place, but using the RemoveAt method makes it slow. Here is an alternative without the RemoveAt method:
public static IEnumerable<T> Shuffle<T>(IEnumerable<T> list, int seed) {
Random rnd = new Random(seed);
List<T> items = new List<T>(list);
for (int i = 0; i < items.Count; i++) {
int pos = rnd.Next(i, items.Count);
yield return items[pos];
items[pos] = items[i];
}
}
I thried this with 10000 integers, and it's about 30 times faster.
Not sure how much of an improvement this is, but would have performance benefits if the list is large and you only need the first few random items.
public static IEnumerable<T> RandomiseList<T>(IList<T> list, int seed)
{
Random random = new Random(seed);
List<T> takeFrom = new List<T>(list);
while (takeFrom.Count > 0)
{
int pos = random.Next(0, takeFrom.Count - 1);
T item = takeFrom[pos];
takeFrom.RemoveAt(pos);
yield return item;
}
}
Removes the need for a temp list or even a temp swap variable.
If I was going to be using this a lot, I'd rewrite it as an extension method.
This looks good to me. Note that you'll get slightly better performance (especially for large lists) if you initialize ret
with the length of list
, so that the list doesn't have to be reallocated:
List<T> ret = new List<T>(list.Count);
What sort of suggestions are you looking for exactly? efficiency? correctness? You do mention unit testing ... I think there could definitely be an improvement there.
I actually helped develop an online game and their shuffling mechanism. I don't really suspect performance is much of an issue, as most algorithms you find are by and large the same. I would suggest the following however,
a. create a random interface
public interface IRandom
{
byte NextRandomByte ();
}
Anything that now consumes this interface can now be mocked\unit tested in a controlled manner or environment. You do not really want to be unit testing truly random algorithms - you won't be able to verify your data!
As for why return a byte, a byte is likely the smallest unit of randomness you could want. Not only that, but if given a means of generating a single random byte, generating a sequence of them and concatenating them together is an easy way of generating an even wider range of random data.
Of course, you will have to be wary of introduing bias to your data ...
b. Ensure quality of data by reducing bias over arbitrary intervals. Assuming underlying data is uniformly random, any interval that is NOT a factor of 256 will introduce bias. Consider this,
// 250 is not a factor of 256!
byte a = random.NextRandomByte () % 250; // values 0-5 are biased!
In the preceeding snippet, values 0-5 have a 2/255 probability to come up, while values 6-249 have a 1/255 probability to come up. That is a significant bias over time. One approach is to check the number coming from the generator, and discard it if it exceeds an acceptable range
// continually generate random data until it is satisfactory
for (byte r = random.NextRandomByte (); r > 250; r = random.NextRandomByte ())
{
}
byte a = r % 250; // r is guaranteed to be on [0, 250], no longer bias
"Acceptable range" may be determined by finding the greatest multiple of your interval that can be represented by your value type. A more generalized form
byte modulo; // specified as parameter
byte biasThreshold = (byte.MaxValue / modulo) * modulo;
for (; unbiasedValue >= biasThreshold; )
{
// generate value
unbiasedValue = random.NextRandomByte ();
}
And if you want values greater than byte, simply concatenate the values together,
int modulo; // specified as parameter
int biasThreshold = (int.MaxValue / modulo) * modulo;
for (; unbiasedValue >= biasThreshold; )
{
// generate value
byte a = random.NextRandomByte ();
byte b = random.NextRandomByte ();
...
int unbiasedValue = a << 24 + b << 16 + c << 8 + d;
}
c. Consume! Place your algorithms or helpers in stateless extension or static classes, like
// forgive my syntax, recalling from memory
public static class IRandomExtensions
{
public int GetUnbiasedInteger (this IRandom random, int modulo) { }
public int GetUnbiasedUnsignedInteger (this IRandom random, uint modulo) { }
public int GetUnbiasedLong (this IRandom random, long modulo) { }
public int GetUnbiasedUnsignedLong (this IRandom random, ulong modulo) { }
...
}
public static class IEnumerableExtensions
{
public IEnumerable<T> Shuffle<T>(this IEnumerable<T> items, IRandom random)
{
// shuffle away!
...
}
}
Deciding whether or not to implement these as methods on your interface or as external methods [as i've done] is up to you - but keep in mind, making them member methods forces implementors to repeat or duplicate code. Personally, I like extensions. They are very clean. And sexy.
int randomNumber = random.UnbiasedInteger (i - 1);
List<int> shuffledNumbers = numbers.Shuffle (random);
Clearly all of the preceeding is optional, but facilitates unit testing and improves overall quality of your random data.
Random and "fair" dice is a very interesting topic in general. If you are at all interested, I strongly recommend you Google it sometime and perform some research. :)
No stats to support this but it would seem better if your return value starts as an array of the same length as the list and then you insert values directly into a randomly generated index.
Be aware of the risks of naive shuffling algorithms, that look good, but do not stand up to testing!
Check this excellent article for an example.