Solr Date Regex Query

2019-07-08 05:32发布

问题:

I want to use solr's regular expression capabilities to query a date field. I'm trying to make a simple query like the following, but I get 0 results and no errors. ...?q=DATE:/200[0-9]-03-30T11\:58\:40Z/&fl=DATE

Here's some sample outputs:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="fl">id,date</str>
<str name="q">date:/.*:.*/</str>
</lst>
</lst>
<result name="response" numFound="39" start="0">
<doc>
<str name="id">1362932537549-A17C9685</str>
<date name="date">2012-10-31T14:57:53Z</date>
</doc>
<doc>
<str name="id">1362932537549-AD280D59</str>
<date name="date">2012-10-25T09:57:53Z</date>
</doc>
<doc>
<str name="id">1362932537549-B091BE97</str>
<date name="date">2012-10-23T09:57:53Z</date>
</doc>
<doc>
<str name="id">1362932537549-B0D8341C</str>
<date name="date">2012-10-22T14:57:53Z</date>
</doc>
<doc>
<str name="id">1362932537549-40083ADB</str>
<date name="date">2010-08-12T14:33:00Z</date>
</doc>
<doc>
<str name="id">1362932537549-9CA68015</str>
<date name="date">2011-07-20T12:25:02Z</date>
</doc>
...

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">8380</int>
<lst name="params">
<str name="fl">id,date</str>
<str name="q">date:/.*.*/</str>
</lst>
</lst>
<result name="response" numFound="1263" start="0">
<doc>
<str name="id">1362932537549-5A0DAFB7</str>
<date name="date">2010-08-12T14:31:00Z</date>
</doc>
<doc>
<str name="id">1362932537549-D712F1C71</str>
<date name="date">2011-12-01T13:23:53Z</date>
</doc>
<doc>
<str name="id">1362932537549-3FAA6BC</str>
<date name="date">2012-05-25T14:26:08Z</date>
</doc>
<doc>
<str name="id">1362932537549-C8A6B81F</str>
<date name="date">2010-08-12T14:25:00Z</date>
</doc>
<doc>
<str name="id">1362932537549-D712F1C8</str>
<date name="date">2011-12-01T13:23:53Z</date>
</doc>
...

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">34443</int>
<lst name="params">
<str name="fl">id,date</str>
<str name="q">date:/.*0.*/</str>
</lst>
</lst>
<result name="response" numFound="65" start="0">
<doc>
<str name="id">1362932537549-A4BC013G</str>
<date name="date">2012-10-29T17:57:53Z</date>
</doc>
<doc>
<str name="id">1362932537549-862F708G</str>
<date name="date">2013-02-14T09:48:46Z</date>
</doc>
<doc>
<str name="id">1362932537549-B8A38A74</str>
<date name="date">2013-02-14T09:49:18Z</date>
</doc>
<doc>
<str name="id">1362932537549-D4BA90CD</str>
<date name="date">2007-10-09T21:53:34Z</date>
</doc>
<doc>
<str name="id">1362932537549-3028513F</str>
<date name="date">2011-06-24T20:30:22Z</date>
</doc>

回答1:

Your regex looks okay, but instead of escaping the colons, try URL-encoding the value:

?q=DATE%3A%2F200%5B0-9%5D-03-30T11%5C%3A58%5C%3A40Z%2F&fl=DATE

(Migrated from a comment on the question.)

It seems it's not possible to regex a date field, directly at least.

As you found, even the following queries date:/.*_.*/, date:/.*,.*/, and date:/.*A.*/ return results, even though timestamps clearly have none of those characters. I think what's happening is that date is not a string field, therefore when you query for a character like :, you're actually finding results that happen to have that character amongst encoded (e.g. raw binary) data. (In layman's terms, imagine opening up binary data (like an executable file) in Notepad and searching for an ASCII character.)

This also explains why you're getting about the same number of results, 20 to 30, for all those queries: statistically speaking, regexing for a random ASCII character amongst binary (and other encoded) data should return about the same frequency of results.