General Problem
Though I may be diagnosing the root cause of an event, determining how many users it affected, or distilling timing logs in order to assess the performance and throughput impact of a recent code change, my tools stay the same: grep
, awk
, sed
, tr
, uniq
, sort
, zcat
, tail
, head
, join
, and split
. To glue them all together, Unix gives us pipes, and for fancier filtering we have xargs
. If these fail me, there's always perl -e
.
These tools are perfect for processing CSV files, tab-delimited files, log files with a predictable line format, or files with comma-separated key-value pairs. In other words, files where each line has next to no context.
XML Analogues
I recently needed to trawl through Gigabytes of XML to build a histogram of usage by user. This was easy enough with the tools I had, but for more complicated queries the normal approaches break down. Say I have files with items like this:
<foo user="me">
<baz key="zoidberg" value="squid" />
<baz key="leela" value="cyclops" />
<baz key="fry" value="rube" />
</foo>
And let's say I want to produce a mapping from user to average number of <baz>
s per <foo>
. Processing line-by-line is no longer an option: I need to know which user's <foo>
I'm currently inspecting so I know whose average to update. Any sort of Unix one liner that accomplishes this task is likely to be inscrutable.
Fortunately in XML-land, we have wonderful technologies like XPath, XQuery, and XSLT to help us.
Previously, I had gotten accustomed to using the wonderful XML::XPath
Perl module to accomplish queries like the one above, but after finding a TextMate Plugin that could run an XPath expression against my current window, I stopped writing one-off Perl scripts to query XML. And I just found out about XMLStarlet which is installing as I type this and which I look forward to using in the future.
JSON Solutions?
So this leads me to my question: are there any tools like this for JSON? It's only a matter of time before some investigation task requires me to do similar queries on JSON files, and without tools like XPath and XSLT, such a task will be a lot harder. If I had a bunch of JSON that looked like this:
{
"firstName": "Bender",
"lastName": "Robot",
"age": 200,
"address": {
"streetAddress": "123",
"city": "New York",
"state": "NY",
"postalCode": "1729"
},
"phoneNumber": [
{ "type": "home", "number": "666 555-1234" },
{ "type": "fax", "number": "666 555-4567" }
]
}
And wanted to find the average number of phone numbers each person had, I could do something like this with XPath:
fn:avg(/fn:count(phoneNumber))
Questions
- Are there any command-line tools that can "query" JSON files in this way?
- If you have to process a bunch of JSON files on a Unix command line, what tools do you use?
- Heck, is there even work being done to make a query language like this for JSON?
- If you do use tools like this in your day-to-day work, what do you like/dislike about them? Are there any gotchas?
I'm noticing more and more data serialization is being done using JSON, so processing tools like this will be crucial when analyzing large data dumps in the future. Language libraries for JSON are very strong and it's easy enough to write scripts to do this sort of processing, but to really let people play around with the data shell tools are needed.
I just found this:
http://stedolan.github.com/jq/
"jq is a lightweight and flexible command-line JSON processor."
2014 update:
@user456584 mentioned:
in the
json
README at http://github.com/trentm/json there is a long list of similar thingsOne way you could do is to convert it to XML. Following uses two perl modules (JSON and XML::Simple) to do fly-by conversion:
which for your example json ends up as: