PHP null and copy-on-write

2019-05-25 17:01发布

问题:

Suppose I want to have two variables and have them both equal to null. (More realistically, I am thinking about an array that contains a large amount of nulls, but the "two variables" scenario is sufficient for the question.) Obviously, I can do this in more than one way. I can do this (method 1):

$a = null;
$b = $a;

By my understanding, the result of this is that there is one zval that is pointed to by two entries in the symbol table: 'a' and 'b'. But alternatively one might do this (method 2):

$a = null;
$b = null;

Naively one would expect that this should result in two different zvals, each pointed to by one entry in the symbol table.

Does it follow from this that if you want to have a large array, and many elements of the array will be null, it's more efficient (in terms of zval/memory use) to create a $master_null variable with the value null, and then write the null elements of the array by assigning using $master_null?

回答1:

Consider this script:

$arr = array();
for ($i = 0; $i < 100000; $i++) $arr[] = null;
echo memory_get_usage() . "\n";

which on my machine outputs: 21687696, that is 21 MB of used memory. On the other hand using this:

$master_null = null;
$arr = array();
for ($i = 0; $i < 100000; $i++) $arr[] = $master_null;
echo memory_get_usage() . "\n";

outputs: 13686832, which is 13 MB. Based on this information, you can assume that at far as memory usage is your concern, it is actually better to indeed use the "master null" variable. However you still need to have all the items in the array, and every entry in a HashTable (internal representation of arrays) takes also some memory.

If you want to dig deeper in the zvals and references, I suggest using the function debug_zval_dump. Using it, you can see, which variables share the same zval:

$a = $b = $c = $d = "abc";
debug_zval_dump($a);
$x = $y = $z = $w = null; 
debug_zval_dump($x);
$q = null;
debug_zval_dump($q);

which outputs:

string(3) "abc" refcount(5)
NULL refcount(5)
NULL refcount(2)

And this implies that although variables $x and $q are both NULL, they are not the same zval. But $x and $y share the same zval, because they are assigned to each other. I believe you know of the function debug_zval_dump, but if not, make sure you carefully read the refcount explanation at http://php.net/manual/en/function.debug-zval-dump.php.

Also at the end of my post, I want to say that this information might be useful for a better knowledge of PHP internals, I think it is quite useless to do any optimizations. Mostly because there are much better places to start optimizing scripts than such micro-optimizations. Also while this is not part of the specification, PHP authors might change this behaviour in the future (e.g. all NULL variables could share the same zval in some future version).



回答2:

From what i understand, PHP zval containers have a reference counting logic. Thus, what my impression is, if you are using references i.e. &$master_null to initialize all NULL values, i think that saves you space i.e. all NULL items of array points to same reference to zval container.

Here is an example:

# php -r '$var1 = NULL; $var2 = $var1; $var3 = $var1; debug_zval_dump(&$var1); debug_zval_dump(&$var2); debug_zval_dump(&$var3);'
&NULL refcount(2)
&NULL refcount(2)
&NULL refcount(2)

You can read more about reference counting basis of PHP here:

something worth reading from this link is:

PHP is smart enough not to copy the actual variable container
when it is not necessary. Variable containers get destroyed 
when the "refcount" reaches zero. The "refcount" gets decreased by 
one when any symbol linked to the variable container leaves the 
scope (e.g. when the function ends) or when unset() is called on a symbol.

Thus every time you use &$master_null, it's "refcount" is increased and when the "refcount" reaches zero, the variable container is removed from memory.


From above comment example here is the memory usage:

# php -r '$arr = array(); for ($i = 0; $i < 100000; $i++) $arr[] = null; echo memory_get_usage() . "\n";'
11248372
# php -r '$master_null = null; $arr = array(); for ($i = 0; $i < 100000; $i++) $arr[] = &$master_null; echo memory_get_usage() . "\n";'
6848488
# php -r '$master_null = null; $arr = array(); for ($i = 0; $i < 100000; $i++) $arr[] = $master_null; echo memory_get_usage() . "\n";'
6848468


回答3:

No all that would achieve is that you would have an extra variable called $master_null. They all point to a null. Having them each point to $master_null is the same thing.