Solr - count documents in the range of two date fi

Here are some example Solr documents I got:

{
  "id": "1",
  "openDate": "2017-12-01T00:00:00.000Z",
  "closeDate": "2017-12-04T00:00:00.000Z"
},
{
  "id": "2",
  "openDate": "2017-12-02T00:00:00.000Z",
  "closeDate": "2017-12-04T00:00:00.000Z"
},
{
  "id": "3",
  "openDate": "2017-12-02T00:00:00.000Z",
  "closeDate": "2017-12-06T00:00:00.000Z" 
}

The dates that a document is "active" are the dates between the openDate (inclusive) and the closeDate (exclusive). I want to count the number of documents that are "active" on each day, so the output should be:

[
  {
    Date: 2017-12-01,
    count: 1
  },
  {
    Date: 2017-12-02,
    count: 3
  },
  {
    Date: 2017-12-03,
    count: 3
  },
  {
    Date: 2017-12-04,
    count: 1
  },
  {
    Date: 2017-12-05,
    count: 1
  }
]

One easy approach to solve this is to keep a multi-valued date field (say called openDates) with all the dates in the range of interest, so we expand the documents like this:

  {
    "id": "1",
    "openDate": "2017-12-01T00:00:00.000Z",
    "closeDate": "2017-12-04T00:00:00.000Z",
    "openDates": ["2017-12-01T00:00:00.000Z",
                  "2017-12-02T00:00:00.000Z",
                  "2017-12-03T00:00:00.000Z"]
  },
  {
    "id": "2",
    "openDate": "2017-12-02T00:00:00.000Z",
    "closeDate": "2017-12-04T00:00:00.000Z",
    "openDates": ["2017-12-02T00:00:00.000Z",
                  "2017-12-03T00:00:00.000Z"]    
  },
  {
    "id": "3",
    "openDate": "2017-12-02T00:00:00.000Z",
    "closeDate": "2017-12-06T00:00:00.000Z",
    "openDates": ["2017-12-02T00:00:00.000Z",
                  "2017-12-03T00:00:00.000Z",
                  "2017-12-04T00:00:00.000Z",
                  "2017-12-05T00:00:00.000Z"]    
  }

Then I can run a facet query like this:

/select?q=*:*&facet=true&facet.field=openDates&rows=0

to get the counts I need.

Is there a better way to solve this in Solr?

Ideally, an alternate approach can help bucket by hour or minute, not just days. The above approach will have a very large multi-valued field if we go more granular. Also, is there a good way to fill the holes (i.e. missing dates) with zero counts?

The DateRangeField will come for the rescue. In schema you will add something like this:

<fieldType name="range_date" class="solr.DateRangeField" />
<field name="active" type="range_date" indexed="true" stored="false"/>

You could specify active range like this:

doc1.addField("active", "[2017-12-01T00:00:00.000Z TO 2017-12-04T00:00:00.000Z]")

and later request range facets by this field.

Example of params with 1 day granularity (you could change the gap param for different values) :

      q.add("facet", "true")
      q.add("facet.range", "active")
      q.add("facet.range.start", "NOW/MONTH")
      q.add("facet.range.end", "NOW/MONTH+1MONTH")
      q.add("facet.range.include", "outer")
      q.add("facet.range.gap", "+1DAY")

I've added facet.range.include=outer to keep exact format response as you like (not including upper and lower bounds). You could change this parameter by choosing something you would like more.

You will get exactly what you need:

2017-12-01T00:00:00Z
1
2017-12-02T00:00:00Z
3
2017-12-03T00:00:00Z
3
2017-12-04T00:00:00Z
1
2017-12-05T00:00:00Z
1

Full example of the code is here.

Solr - count documents in the range of two date fi

问题:

回答1:

收藏的人(0)

Solr - count documents in the range of two date fi

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮