My app needs to produce json of an object that has a large data
property of type array. The array needs to remain in memory as it collects DB output and some properties can only be determined once the array is completed.
Complication: the array is numerically-based and must appear as such in the json output, therefore straight json_encode()
is not an option.
To make this possible on low-spec machines like RasPi I've looked into trimming memory consumption:
- Use
SPLFixedArray
- Use
string
and pack()
Both approaches take care of the array storage memory problem but fail when it comes to encoding in JSON.
I've looked into implementing JsonSerializable
but as it forces users to return the result which is then encoded in Json I'm back to
public function jsonSerialize() {
return $this->toArray();
}
which has the same memory problems.
zendframework/Component_ZendJson
looks promising as it looks for objects having a toJson()
method to provide their own encoding as string
instead of object
.
I'm wondering if there are better options that don't give memory issues?
In my investigation I've looked at 5 different approaches for storing large arrays of tuples in memory, summarized here with their results (sampled at 50k records):
Naive
Exporting json is straightforward with json_encode using array(array(), array())
Memory: 18.5MB (huge)
Time: ~100ms to build and dump the array (Windows PC)
SPL Library
This approach stores everything in nested SPLFixedArrays
: SPLFixedArray[SPLFixedArray]
. JSON export was done extending Zend\Json\Encoder
by implementing the toJson
method.
Memory: 15.5MB (still large)
Time: ~1.3s, x10 slower
SPL Library
Similar to 2, but instead of the inner SPLFixedArray
uses packed strings from PHP's pack()
function.
Memory: 3.5MB (5 times smaller)
Time: ~1.3s, x10 slower - apparently pack()
is similarly slow as nested array.
SPL Library
Similar to 2, but instead of the inner SPLFixedArray
the actual Tuples are simply written as sequential values to the root array.
Memory: 3.25MB (again smaller)
Time: ~0.7s, only x6 slower - do we have a winner here?
pack()
Similar to 3, but instead of the root SPLFixedArray
pack everything into a single string using PHP's pack()
function. This does obviously need knowledge about and a fixed, identical structure of the individual arrays.
Memory: 1.25MB (really small - only 1/12th of original memory)
Time: ~1.7s, x16 slower
CONCLUSION
While (5) offers best memory utilization it is also extremely slow. For my purposes I've settled on (4) which is about 20% of original memory consumption but- when JSON encoding is taken into account- also 5~6 times slower. An acceptable compromise.
According to json.org:
JSON is built on two structures:
- A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
- An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.
I don't know if there are memory issues with this but consider the following code:
<?php
// No declared index causes no index in JSON
$arr = array('dssdf','38904uj');
echo json_encode($arr).'<br><br>';
// ["dssdf","38904uj"]
// start array at 0 removes the index from JSON
$arr = array('0'=>'dssdf','1'=>'38904uj');
echo json_encode($arr).'<br><br>';
// ["dssdf","38904uj"]
// start array at 1 forces the index to show in JSON
$arr = array('1'=>'dssdf','2'=>'38904uj');
echo json_encode($arr).'<br><br>';
// {"1":"dssdf","2":"38904uj"}
// skip an index forces the index to show in JSON
$arr = array('0'=>'dssdf','1'=>'38904uj','3'=>'321as5d4');
echo json_encode($arr).'<br><br>';
// {"0":"dssdf","1":"38904uj","3":"321as5d4"}
// JSON_FORCE_OBJECT option forces indexes
$arr = array('0'=>'dssdf','1'=>'38904uj');
echo json_encode($arr, JSON_FORCE_OBJECT).'<br><br>';
// {"0":"dssdf","1":"38904uj"}
?>