I need to count number of objects in each group with JQ, but only for N most recent objects.
Sample input, for N=3:
{"modified":"Mon Sep 25 14:20:00 +0000 2018","object_id":1,"group_id":"C"}
{"modified":"Mon Sep 25 14:23:00 +0000 2018","object_id":2,"group_id":"A"}
{"modified":"Mon Sep 25 14:21:00 +0000 2018","object_id":3,"group_id":"B"}
{"modified":"Mon Sep 25 14:22:00 +0000 2018","object_id":4,"group_id":"A"}
Expected output:
{"A",2}
{"B",1}
I'm failing even to select a date-based subset which will preserve the structure of the objects: this is the best I managed to achieve:
[
.modified |= strptime("%a %b %d %H:%M:%S %z %Y") |
.modified |= mktime |
.modified |= strftime("%Y-%m-%d %H:%M:%S")
] |
sort_by(.modified) |
.[] |
{modified, object_id, group_id}
For some reason, results are still unsorted.
I'm also failing to convert such a list to an array to select only N most recent entries.
And after that I will need to count number of objects per group in some way.
Overall, looks like I need an extremely intuitive explanation on how arrays and lists of objects convert to each other, and how to modify some of their fields and, after that, to extract only fields required. The tutorials I've found so far did not help, unfortunately.
Assuming your input file is:
You can try the following:
Note that on my system, the option
%z
ofstrptime
isn't working. So I replaced it with+0000
(which is anyway not used in the time conversion).The accepted answer uses the
-s
command-line option, which requires that the entire input data fit into memory. For very large data sets, this may not be possible.Since the release of jq 1.5 (in 2015), an alternative is available. Here, therefore, a memory-efficient solution using
inputs
is presented.The key functionality is encapsulated in the following jq filter:
A solution to the problem at hand (with N==3) can now be obtained in just three additional lines:
Note that this assumes the -n command-line option is used. If it is omitted, the first line of input will be ignored.
Large N
For large datasets, if the value of N is also large, it would probably be worth the trouble to tweak the above to use jq’s support fot binary search (
bsearch
) instead ofsort_by
. It might similarly be worthwhile cacheing themktime
values.