Here are some example Solr documents I got:
{
"id": "1",
"openDate": "2017-12-01T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z"
},
{
"id": "2",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z"
},
{
"id": "3",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-06T00:00:00.000Z"
}
The dates that a document is "active" are the dates between the openDate (inclusive) and the closeDate (exclusive). I want to count the number of documents that are "active" on each day, so the output should be:
[
{
Date: 2017-12-01,
count: 1
},
{
Date: 2017-12-02,
count: 3
},
{
Date: 2017-12-03,
count: 3
},
{
Date: 2017-12-04,
count: 1
},
{
Date: 2017-12-05,
count: 1
}
]
One easy approach to solve this is to keep a multi-valued date field (say called openDates
) with all the dates in the range of interest, so we expand the documents like this:
{
"id": "1",
"openDate": "2017-12-01T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z",
"openDates": ["2017-12-01T00:00:00.000Z",
"2017-12-02T00:00:00.000Z",
"2017-12-03T00:00:00.000Z"]
},
{
"id": "2",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-04T00:00:00.000Z",
"openDates": ["2017-12-02T00:00:00.000Z",
"2017-12-03T00:00:00.000Z"]
},
{
"id": "3",
"openDate": "2017-12-02T00:00:00.000Z",
"closeDate": "2017-12-06T00:00:00.000Z",
"openDates": ["2017-12-02T00:00:00.000Z",
"2017-12-03T00:00:00.000Z",
"2017-12-04T00:00:00.000Z",
"2017-12-05T00:00:00.000Z"]
}
Then I can run a facet query like this:
/select?q=*:*&facet=true&facet.field=openDates&rows=0
to get the counts I need.
Is there a better way to solve this in Solr?
Ideally, an alternate approach can help bucket by hour or minute, not just days. The above approach will have a very large multi-valued field if we go more granular. Also, is there a good way to fill the holes (i.e. missing dates) with zero counts?