I have found that array_key_exists
is over 1000x slower than isset
at check if a key is set in an array reference. Does anyone that has an understanding of how PHP is implemented explain why this is true?
EDIT: I've added another case that seems to point to it being overhead required in calling functions with a reference.
Benchmark Example
function isset_( $key, array $array )
{
return isset( $array[$key] );
}
$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
array_key_exists( $i, $my_array );
$my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
isset( $my_array[$i] );
$my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
isset_( $i, $my_array );
$my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
array_key_exists( $i, $my_array_ref );
$my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
isset( $my_array_ref[$i] );
$my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
isset_( $i, $my_array_ref );
$my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
Output
array_key_exists( $my_array ) 0.0056459903717
isset( $my_array ) 0.00234198570251
isset_( $my_array ) 0.00539588928223
array_key_exists( $my_array_ref ) 3.64232587814 // <~ what on earth?
isset( $my_array_ref ) 0.00222992897034
isset_( $my_array_ref ) 4.12856411934 // <~ what on earth?
I'm on PHP 5.3.6.
Not array_key_exists, but the removal of the reference (= NULL) causes this. I commented it out from your script and this is the result:
Only removed the unsetting from the
array_key_exists( $my_array_ref )
part, this is the modified part for reference:Here is the source of the array_key_exists function for 5.2.17. You can see that even if the key is null, PHP attempts to compute a hash. Although it's interesting that if you remove
then it performs better. There must be multiple hash lookups occuring.
At work I've got a VM instance of PHP that includes a PECL extension called VLD. This lets you execute PHP code from the commandline and rather than execute it, it returns the generated opcode instead.
It's brilliant at answering questions like this.
http://pecl.php.net/package/vld
Just in case you go this route (and if you're generally curious about how PHP works internally, i think you should) you should definitely install it on a virtual machine (that is, i wouldn't install it on a machine i'm trying to develop on or deploy to). And this is the command you'll use to make it sing:
Looking at the opcodes will tell you a more complete story, however, I have a guess.... Most of PHP's built-ins make a copy of an Array/Object and act upon that copy (and not a copy-on-write either, an immediate copy). The most widely known example of this is foreach(). When you pass an array into foreach(), PHP is actually making a copy of that array and iterating on the copy. Whis is why you'll see a significant performance benefit by passing an array as a reference into foreach like this:
foreach($someReallyBigArray as $k => &$v)
But this behavior -- that passing in an explicit reference like that -- is unique to foreach(). So I would be very surprised if it made an array_key_exists() check any faster.
Ok, back to what I was getting at..
Most the built-ins take a copy of an array and act upon that copy. I am going to venture a completely unqualified guess that isset() is highly optimized and that one of those optimizations is perhaps to not do an immediate copy of an Array when its passed-in.
I'll try to answer any other questions you may have but you could probably read a lot of you google for "zval_struct" (which is the data structure in the PHP internals which stores each variable. It's a C struct (think.. an associative array) that has keys like "value", "type", "refcount".