I have 1000 unique objects in a java.util.List
, each referring to an image, each image in the 1000-list is unique and now I'd like to shuffle them, so that I can use the first 20 objects and present them to the website-user.
The user can then click a button saying "Shuffle", and I retrieve the 1000 images again from scratch and calling again shuffle()
.
However, it seems that out of 1000 image objects, I very often see the same image again and again between the 20-image-selections.
Something seems to be wrong, any better suggestion, advices?
My code is very simple:
List<String> imagePaths = get1000Images();
Collections.shuffle(imagePaths);
int i = 0;
for (String path: imagePaths) {
... do something with the path ...
i++;
if (i >= 20) break;
}
I know that Collections.shuffle()
is well distributed:
see for instance http://blog.ryanrampersad.com/2012/03/03/more-on-shuffling-an-array-correctly/
However, I just have the feeling that the probability of seeing the same image over and over again in a set of 20 images out of 1000 should be much less...
Inputs highly appreciated.
If you're showing 20 images out of 1000 the probability of seeing any one of that 20 repeated in the next iteration is approximately 0.34 so you shouldn't be surprised to see images repeating.
The chances of seeing a specific image is still one in a thousand, but if you're looking for twenty images the chances are much higher.
We can calculate the probability of none of the previous 20 images repeating as:
980 979 961
———— × ——— × ... × ——— ≈ 0.66
1000 999 981
And so the probability of seeing a repeat is one minus this, or approximately 0.34.
And the probability of seeing an image repeated in either of the next two iterations is:
1 - (0.66 × 0.66) ≈ 0.56
In other words, it's more likely than not that you'll see a repeated image over the two following cycles. (And this isn't including images repeated from the second cycle in the third which will only make it more likely.)
For what it's worth, here's some Java code to do the above calculation:
float result = 1.0f;
int totalImages = 1000;
int displayedImages = 20;
for (int i = 0; i < displayedImages; i++) {
result = result * (totalImages - displayedImages - i) / (totalImages - i);
}
System.out.println(result);
Its human nature to see patterns which are not there. Many people see patterns in the planets and stars as guiding their life.
In the first 1000 digits of PI there are six nines in a row. Does that mean the digits of PI are not random? no. The pattern doesn't occur again any more than your might expect.
Having said that, Random is not completely random and it will repeat after 2^48 calls. (it uses a 48-bit seed) This means its not possible to produce every possible long
or double
using it. If you want more randomness you can use SecureRandom with shuffle instead.
It sounds like what you want is something like this
List<String> imagePaths = new ArrayList<>();
// called repeatedly
if (imagePaths.size() <= 500) {
imagePaths = get1000Images();
Collections.shuffle(imagePaths);
}
for (String path: imagePaths.subList(0, 20)) {
... do something with the path ...
}
imagePaths = imagePaths.subList(20, imagePaths.size());
This will ensure that you don't see the same image in the last 500 calls.
Your intuition is correct for a specific image [you are not likely to see a specific image over and over again], but not for a general image [you are likely to see some image repeating]. This is one of these places in probability that our automatic intuition is wrong...
This reminds me the birthday paradox, which contradicts the intuition, and says - for a group of 23 people, the likelihood of 2 of them having the same birthday is 0.5, much more then the intuition expects!
I did a 52 card shuffle four different times and marked every time each iteration repeated the exact same card in the exact same slot, which gave me approximately 14 out of 208 cards, which was approximately 93.3% random.
Following your question I wrote the following program. I created list of sequential integers and shuffled it 10, 100, 1000 and 10000 times. After every series of shuffles I checked value of element in 5th position of the array and created array of counters: how many times each number appears at 5th position.
Here is the program:
public class MyTest {
public static void main(String[] args) {
int n = 10;
List<Integer> list = new ArrayList<Integer>();
for (int i = 0; i < n; i++) {
list.add(i);
}
int[] counters = new int[n];
for(int shuffles : new int[] {10, 100, 1000, 10000}) {
Arrays.fill(counters, 0);
for (int i = 0; i < shuffles; i++) {
Collections.shuffle(list);
// check 5-th element
int fifth = list.get(5);
counters[fifth] = counters[fifth] + 1;
}
System.out.println(shuffles + ": " + Arrays.toString(counters));
}
}
}
And here are the results:
10: [0, 1, 1, 1, 2, 0, 0, 3, 2, 0]
100: [11, 9, 9, 7, 10, 12, 13, 13, 8, 8]
1000: [100, 101, 107, 101, 95, 96, 109, 83, 93, 115]
10000: [1015, 942, 990, 1003, 1015, 1037, 977, 1060, 950, 1011]
As you can see the "randomality" depends on number of shuffles. If you shuffle array 10 times the minimal counter is 0 and the maximal is 3.
The difference between these values for 100 shuffles (in per cents) much smaller.
The numbers a almost the same for 10000 shuffles.
I think that this test models your use-case: you are showing images in specific position of shuffled collection.
Please see post of @amit that describes the meaning of shuffle.
So, the solution for you is to shuffle your array 10 times.
EDIT: @Dave Webb gave perfect explanation for the case.
The second thinking is the following: you actually do not have to shuffle you list of 1000 elements to take 20 first element from it. It is enough to take 20 random elements. You will get the same effect but much more effective solution:
Set<Image> show = new HashSet<Image>();
Random r = new Random(System.currentTimeMillis());
for (int i = 0; show.size() < 20; i++) {
show.add(list.get(r.nextInt()));
}
With that code, if you're seeing the same image over and over, it means the same image exists many times in the list. Whereever you're getting your 1000 images from, there are duplicates.