I've always heard and searched for new php 'good writing practice', for example: It's better (for performance) to check if array key exists than search in array, but also it seems better for memory too:
Assuming we have:
$array = array
(
'one' => 1,
'two' => 2,
'three' => 3,
'four' => 4,
);
this allocates 1040 bytes of memory,
and
$array = array
(
1 => 'one',
2 => 'two',
3 => 'three',
4 => 'four',
);
requires 1136 bytes
I understand that the key
and value
surely will have different storing mechanism, but
please can you actually point me to the principle how does it work?
Example 2 (for @teuneboon):
$array = array
(
'one' => '1',
'two' => '2',
'three' => '3',
'four' => '4',
);
1168 bytes
$array = array
(
'1' => 'one',
'2' => 'two',
'3' => 'three',
'4' => 'four',
);
1136 bytes
consuming same memory:
4 => 'four',
'4' => 'four',
Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.
TL;DR
Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:
ulong
(unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions.'4'
, will be treated as integer keys and translated into integer hash result as it was integer key. Thus,'4'=>'foo'
and4 => 'foo'
are same things.Also, important note: the graphics here are copyright of PHP internals book
Hash-map for PHP arrays
PHP arrays and C arrays
You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance,
3
,-6
,'foo'
as your "keys" there.So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so,
0-9
,a-z
andA-Z
, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.PHP hash-map keys & collisions
Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.
To illustrate those relations within those lists, here's a graphic:
Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called
arBuckets
. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.One more list: order
To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:
In addition to
pListLast
andpListNext
, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.Array element from inside
Now we're ready to look into: what is array element, so, bucket:
Here we are:
h
is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself)pNext
/pLast
are pointers inside collision-resolution linked listpListNext
/pListLast
are pointers inside order-resolution linked listpData
is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP usespDataPtr
(sopData = &pDataPtr
)From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus,
h
is alwaysulong
and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.Hash-table itself
It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called
_hashtable
:I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:
arBuckets
is what was described above, the buckets storage,pListHead
/pListTail
are pointers to order-resolution listnTableSize
determines size of hash-table. And this is directly related to memory allocation:nTableSize
is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size.Conclusion
It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like
'four'
,'one'
in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.
You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.
Arrays in PHP are implemented as hashmaps. Hence the length of the value you use for the key has little impact on the data requirement. In older versions of PHP there was a significant performance degradation with large arrays as the hash size was fixed at array creation - when collisions starting occurring then increasing numbers of hash values would map to linked lists of values which then had to be further searched (with an O(n) algorithm) instead of a single value, but more recently the hash appears to either use a much larger default size or is resized dynamically (it just works - I can't really be bothered reading the source code).
Saving 4 bytes from your scripts is not going to cause Google any sleepless nights. If you are writing code which uses large arrays (where the savings may be more significant) you're probably doing it wrong - the time and resource taken to fill up the array could be better spent elsewhere (like indexed storage).
Although both arrays are accessed in a different way (i.e. via string or integer value), the memory pattern is mostly similar.
This is because the string allocation either happens as part of the zval creation or when a new array key needs to be allocated; the small difference being that numeric indices don't require a whole zval structure, because they're stored as an (unsigned) long.
The observed differences in memory allocation are so minimal that they can be largely attributed to either the inaccuracy of
memory_get_usage()
or allocations due to additional bucket creation.Conclusion
How you want to use your array must be the guiding principle in choosing how it should be indexed; memory should only become an exception to this rule when you run out of it.
From PHP manual Garbage Collection http://php.net/manual/en/features.gc.php
PHP does not return released memory very well; Its primary usage online does not require it and effective garbage collection takes time away from providing the output; When the script ends the memory is going to be returned anyway.
Garbage collection happens.
When you tell it to
int gc_collect_cycles ( void )
When you leave a function
Better understanding of PHP's Garbage collection from a web host, (no affiliation). http://www.sitepoint.com/better-understanding-phps-garbage-collection/
If you are considering byte by byte how the data is set in memory. Different ports are going to effect those values. 64bit CPUs performance is best when data sits on the first bit of a 64bit word. For the max performance a specific binary they would allocate the start of a block of memory on the first bit, leaving up to 7 bytes unused. This CPU specific stuff depends on what compiler was used to compile the PHP.exe. I can not offer any way to predict exact memory usage, given that it will be determined differently by different compilers.
Alma Do, post goes to the specifics of the source which is sent to the compiler. What the PHP source requests and the compiler optimizes.
Looking at the specific examples you posted. When the key is a ascii letter they are taking 4 bytes (64 bits) more per entry ... this suggests to me, (assuming no garbage or memory holes, ect), that the ascii keys are greater than 64 bits, but the numeric keys are fit in a 64bit word. It suggests to me your using a 64bit computer and your PHP.exe is compiled for 64bit CPUs.