in_array() performance optimization

I have the following condition:

if(in_array($needle, $haystack) ||
    in_array($needle . "somePostfix", $haystack) ||
    in_array($needle . "someOtherPostfix", $haystack) ||
    // and some more) {
    // do something
}

My haystack contains more than 10k elements, and this check takes about 400ms. I know that in_array has to iterate over the whole array multiple times. In my case the common case is that the element is not found. and I tried to improve this by creating the following method that only iterates once over the haystack:

function wildcardInArray($needle, $haystack) {
    foreach ($haystack as $value) {
        if (true === fnmatch($needle . '*', $haystack)) {
            return true;
        }
    }
    return false;
}

But this decreases my performance even more, seems to me that fnmatch is the bottleneck.

Is there any improvement for this case of array search?

标签： php performance optimization

2条回答

祖国的老花朵

2楼-- · 2020-06-16 08:51

You can use your array as 'keys', i.e:

$arr = ['a', 'b', 'c', … ]; to $arr = ['a' => true, 'b' => true, …].

You will consume more memory but you will have an instant result with isset($arr[$key]);.

Fastest but biggest in memory, you can use stdClass and isset($obj->$key);

$obj = new stdClass();
$obj->{'a'} = true;
$obj->{'b'} = true;
$obj->{'…'} = true;

If you can't change your array structure, tell us if you can sort manually the array contents?

// generic
$length = strlen($needle);
$char = $needle[0];
$found = false;
$suffixes = [ false, 'somePostfix', 'someOtherPostfix' ];

foreach($haystack as $entry) {
  if ($char === $entry[0] && $needle === substr($entry, 0, $length)) {
    $suffix = substr($entry, $length);
    if (in_array($suffix, $suffixes, true)) {
      $found = true;
      break;
    }
  }
}

0人赞添加讨论(0) 举报

贪生不怕死

3楼-- · 2020-06-16 08:52

This is a very interesting question that doesn't appear to have a great answer. I did some very unscientific bench-marking and I was not able to get any faster than in_array for a $haystack with 100000 elements.

PHP 5.5.9-1ubuntu4.14 (cli) (built: Oct 28 2015 01:34:46) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies
    with Xdebug v2.2.3, Copyright (c) 2002-2013, by Derick Rethans

Sorting Time*:    0.19367408752441
Imploding Time**: 0.0207359790802
preg_match:       0.10927486419678
needle ===:       0.083639144897461
in_array:         0.019428968429565
array_flip:       0.028955936431885
array_intersect:  0.15198707580566
array_diff:       0.15532493591309

//*sort without search (binary search wouldn't add much time)
//**time it took to implode the array 
//     (no search was performed, this search WOULD take significant time if implemented)

As you can see, only three of these methods took less than 100ms, needle ===, in_array and array_flip. And out of these three, in_array was clearly the fastest. Now the question is how many postfix-es do you have? The running time on in_array will be O(n*m) (n is size of your haystack, m is the number of postfixes), which is a problem if m is also very large. If m is significantly large, sorting the data once and performing a binary search on the sorted list will be O(m*log(n)) which grows much slower, but has a higher initial overhead as shown in the sorting time above. Even better, if you have a very large m would probably be array_flip as each search should only take O(1) lookup after the initial flip.

CODE

Haystack creation

$haystack = array();

function getRandomWord($len = 10) {
        $len = rand(3,10);
        $word = array_merge(range('a', 'z'), range('A', 'Z'));
            shuffle($word);
            return substr(implode($word), 0, $len);
}

$numWords = 100000;
for($i = 0; $i < $numWords; $i++) {
    $haystack[] = getRandomWord();
}

TESTs

//*Sorting*    
$copy = $haystack;
sort($copy);


//implode    
$copy = implode($haystack, " ");


//*preg_match_test*
function preg_match_test($regex, $haystack) {
    $matches = false;
    foreach($haystack as $value) {
        if (preg_match($regex, $value)) {
            $matches = true;
            break;
        }
    }
    return $matches;
}

//needle ===
function equalsNeedle($needles, $haystack) {
    $matches = false;
    foreach ($haystack as $value) {
        foreach($needles as $needle) {
            if ($needle === $value) {
                $matches = true;
                break 2;
            }
        }
    }
    return $matches;
}

//in_array
function baseCase($needles, $haystack) {
    $matches = false;
    foreach($needles as $needle) {
        if (in_array($needle, $haystack)) {
            $matches = true;
            break;
        }
    }
    return $matches;
}

//array_flip
function arrayFlipping($needles, $haystack) {
    $matches = false;
    $copy = array_flip($haystack);
    foreach ($needles as $needle) {
        if (array_key_exists($needle, $copy)) {
            $matches = true;
        }
    }
    return $matches;
}

//array_intersect
function arrayIntersect($needles, $haystack) {
    if (count(array_intersect($needles, $haystack)) > 0) {
        return true;
    }
    return false;
}

//array_diff
function arrayDiff($needles, $haystack) {
    if (count(array_diff($needles, $haystack)) !== count($needles)) {
        return true;
    }
    return false;
}

Calling Code

$array = array("foo","foobar","foobazz","foobuzz");
$base = "foo";
$regex = "/^$base(bizz|bazz|buzz|)$/";

echo "preg_match: ";
preg_match_test($regex, $haystack);
echo "needle === ";
equalsNeedle($array, $haystack);
echo "in_array:  ";
baseCase($array, $haystack);
echo "array_flip:  ";
arrayFlipping($array, $haystack);
echo "array_intersect:  ";
arrayIntersect($array, $haystack);
echo "array_diff:  ";
arrayDiff($array, $haystack);

All tests were wrapped with timing code using microtime(true).

0人赞添加讨论(0) 举报

in_array() performance optimization

CODE

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间