I have the following condition:
if(in_array($needle, $haystack) ||
in_array($needle . "somePostfix", $haystack) ||
in_array($needle . "someOtherPostfix", $haystack) ||
// and some more) {
// do something
}
My haystack contains more than 10k elements, and this check takes about 400ms. I know that in_array
has to iterate over the whole array multiple times. In my case the common case is that the element is not found. and I tried to improve this by creating the following method that only iterates once over the haystack:
function wildcardInArray($needle, $haystack) {
foreach ($haystack as $value) {
if (true === fnmatch($needle . '*', $haystack)) {
return true;
}
}
return false;
}
But this decreases my performance even more, seems to me that fnmatch
is the bottleneck.
Is there any improvement for this case of array search?
You can use your array as 'keys', i.e:
$arr = ['a', 'b', 'c', … ];
to$arr = ['a' => true, 'b' => true, …]
.You will consume more memory but you will have an instant result with
isset($arr[$key]);
.Fastest but biggest in memory, you can use stdClass and
isset($obj->$key);
If you can't change your array structure, tell us if you can sort manually the array contents?
This is a very interesting question that doesn't appear to have a great answer. I did some very unscientific bench-marking and I was not able to get any faster than
in_array
for a$haystack
with 100000 elements.As you can see, only three of these methods took less than 100ms,
needle ===
,in_array
andarray_flip
. And out of these three,in_array
was clearly the fastest. Now the question is how many postfix-es do you have? The running time onin_array
will beO(n*m)
(n
is size of your haystack,m
is the number of postfixes), which is a problem ifm
is also very large. Ifm
is significantly large, sorting the data once and performing a binary search on the sorted list will beO(m*log(n))
which grows much slower, but has a higher initial overhead as shown in the sorting time above. Even better, if you have a very largem
would probably bearray_flip
as each search should only takeO(1)
lookup after the initial flip.CODE
Haystack creation
TESTs
Calling Code
All tests were wrapped with timing code using
microtime(true)
.