I need to store a multi-dimensional associative array of data in a flat file for caching purposes. I might occasionally come across the need to convert it to JSON for use in my web app but the vast majority of the time I will be using the array directly in PHP.
Would it be more efficient to store the array as JSON or as a PHP serialized array in this text file? I've looked around and it seems that in the newest versions of PHP (5.3), json_decode
is actually faster than unserialize
.
I'm currently leaning towards storing the array as JSON as I feel its easier to read by a human if necessary, it can be used in both PHP and JavaScript with very little effort, and from what I've read, it might even be faster to decode (not sure about encoding, though).
Does anyone know of any pitfalls? Anyone have good benchmarks to show the performance benefits of either method?
JSON is simpler and faster than PHP's serialization format and should be used unless:
json_decode()
: "This function will return false if the JSON encoded data is deeper than 127 elements."I've tested this very thoroughly on a fairly complex, mildly nested multi-hash with all kinds of data in it (string, NULL, integers), and serialize/unserialize ended up much faster than json_encode/json_decode.
The only advantage json have in my tests was it's smaller 'packed' size.
These are done under PHP 5.3.3, let me know if you want more details.
Here are tests results then the code to produce them. I can't provide the test data since it'd reveal information that I can't let go out in the wild.
JSON is better if you want to backup Data and restore it on a different machine or via FTP.
For example with serialize if you store data on a Windows server, download it via FTP and restore it on a Linux one it could not work any more due to the charachter re-encoding, because serialize stores the length of the strings and in the Unicode > UTF-8 transcoding some 1 byte charachter could became 2 bytes long making the algorithm crash.
Really nice topic and after reading the few answers, I want to share my experiments on the subject.
I got a use case where some "huge" table needs to be queried almost every time I talk to the database (don't ask why, just a fact). The database caching system isn't appropriate as it'll not cache the different requests, so I though about php caching systems.
I tried
apcu
but it didn't fit the needs, memory isn't enough reliable in this case. Next step was to cache into a file with serialization.Table has 14355 entries with 18 columns, those are my tests and stats on reading the serialized cache:
JSON:
As you all said, the major inconvenience with
json_encode
/json_decode
is that it transforms everything to anStdClass
instance (or Object). If you need to loop it, transforming it to an array is what you'll probably do, and yes it's increasing the transformation timeMsgpack
@hutch mentions msgpack. Pretty website. Let's give it a try shall we?
That's better, but requires a new extension; compiling sometimes afraid people...
IgBinary
@GingerDog mentions igbinary. Note that I've set the
igbinary.compact_strings=Off
because I care more about reading performances than file size.Better than msg pack. Still, this one requires compiling too.
serialize
/unserialize
Better performances than JSON, the bigger the array is, slower
json_decode
is, but you already new that.Those external extensions are narrowing down the file size and seems great on paper. Numbers don't lie*. What's the point of compiling an extension if you get almost the same results that you'd have with a standard PHP function?
We can also deduce that depending on your needs, you will choose something different than someone else:
That's it, another serialization methods comparison to help you choose the one!
*Tested with PHPUnit 3.7.31, php 5.5.10 - only decoding with a standard hardrive and old dual core CPU - average numbers on 10 same use case tests, your stats might be different
Before you make your final decision, be aware that the JSON format is not safe for associative arrays -
json_decode()
will return them as objects instead:Output is:
just an fyi -- if you want to serialize your data to something easy to read and understand like JSON but with more compression and higher performance, you should check out messagepack.