This refers to one of my previous questions: array_unique vs array_flip - This states that array_flip(array_flip())
is much quicker than array_unique()
when dealing with simple strings and integers.
What I would like to know is why array_unique()
creates a copy of the array, sorts it then removed the duplicates
The source for both functions is available here.
Thanks in advance!
If you think about it algorithmically, the way to remove duplicates is to go through a list, keep track of items you find, and get rid of things that are already in that "found this" list. One easy way to accomplish this is to sort a list. That way it's obvious where to remove duplicates efficiently. Think about you, let alone a computer; which one of these lists is easier to remove duplicates from?
apple
banana
cantaloupe
apple
durian
apple
banana
cantaloupe
or
apple
apple
apple
banana
banana
cantaloupe
cantaloupe
durian
Edit: After looking into it a bit (and finding this article), it looks like while the two both get the job done, they are not functionally equivalent, or at least they aren't always. To paraphrase a couple of these points:
- array_unique() sorts the values, as you noted, so array_flip(array_flip()) wouldn't return the same-ordered array -- but this might be desired.
- If the values are objects, then you can't make them keys (right?), i.e. the flip method wouldn't work out of the box on all arrays, whereas the sort method works fine, regardless of the value types.
I think Dan Fego gave a wonderful answered as to why one would sort an array prior to removing duplicates; however, I’d like to examine what array_flip()
does. I’ll be using the following array to illustrate:
'a' => 'apple'
'b' => 'banana'
'c' => 'apple'
'd' => 'date'
array_flip()
exhanges the keys and values producing
'apple' => 'a'
'banana' => 'b'
'apple' => 'c'
'date' => 'd'
However, keys must be unique. The manual describes how array_flip()
handles this:
If a value has several occurrences, the latest key will be used as its
values, and all others will be lost.
So we get something like this:
'banana' => 'b'
'apple' => 'c'
'date' => 'd'
So if we use array_flip(array_flip())
we get:
'b' => 'banana'
'c' => 'apple'
'd' => 'date'
As for the motivation behind array_unique()
, we can only speculate unless Rasmus Lerdorf or someone currently working on PHP development cares to answer.