I have a large file (>50Mb) which contains a JSON hash. Something like:
{
"obj1": {
"key1": "val1",
"key2": "val2"
},
"obj2": {
"key1": "val1",
"key2": "val2"
}
...
}
Rather than parsing the entire file and taking say the first ten elements, I'd like to parse each item in the hash. I actually don't care about the key, i.e. obj1
.
If I convert the above to this:
{
"key1": "val1",
"key2": "val2"
}
"obj2": {
"key1": "val1",
"key2": "val2"
}
I can easily achieve what I want using Yajl streaming:
io = File.open(path_to_file)
count = 10
Yajl::Parser.parse(io) do |obj|
puts "Parsed: #{obj}"
count -= 1
break if count == 0
end
io.close
Is there a way to do this without having to alter the file? Some sort of callback in Yajl maybe?
I ended up solving this using JSON::Stream which has callbacks for
start_document
,start_object
etc.I gave my 'parser' a
to_enum
method which emits all the 'Resource' objects as they're parsed. Note thatResourcesCollectionNode
is never really used unless you completely parse the JSON stream, and theResourceNode
is a subclass ofObjectNode
for naming purposes only, though I might just get rid of it:and an example in use:
Output:
I faced the same problem and created the gem json-streamer that will save you the need to create your own callbacks.
The usage in your case would be (v 0.4.0):
Applying it on your example it will yield the objects without the 'obj' keys:
Let me know if you managed to try it.