Compound Queries with Redis

2019-04-09 13:13发布

问题:

For learning purposes I'm trying to write a simple structured document store in Redis. In my example application I'm indexing millions of documents that look a little like the following.

<book id="1234">
    <title>Quick Brown Fox</title>
    <year>1999</year>
    <isbn>309815</isbn>
    <author>Fred</author>
</book>

I'm writing a little query language that allows me to say YEAR = 1999 AND TITLE="Quick Brown Fox" (again, just for my learning, I don't care that I'm reinventing the wheel!) and this should return the ID's of the matching documents (1234 in this case). The AND and OR expressions can be arbitrarily nested.

For each document I'm generating keys as follows

BOOK_TITLE.QUICK_BROWN_FOX = 1234
BOOK_YEAR.1999 = 1234

I'm using SADD to plop these documents in a series of sets in the form KEYNAME.VALUE = { REFS }.

When I do the querying, I parse the expression into an AST. A simple expression such as YEAR=1999 maps directly to a SMEMBERS command which gets me the set of matching documents back. However, I'm not sure how to most efficiently perform the AND and OR parts.

Given a query such as:

(TITLE=Dental Surgery OR TITLE=DIY Appendectomy)
    AND
(YEAR = 1999 AND AUTHOR = FOO)

I currently make the following requests to Redis to answer these queries.

-- Stage one generates the intermediate results and returns RANDOM_GENERATED_KEY3
SUNIONSTORE RANDOMLY_GENERATED_KEY1 BOOK_TITLE.DENTAL_SURGERY BOOK_TITLE.DIY_APPENDECTOMY
SINTERSTORE RANDOMLY_GENERATED_KEY2 BOOK_YEAR.1999 BOOK_YEAR.1998
SINTERSTORE RANDOMLY_GENERATED_KEY3 RANDOMLY_GENERATED_KEY1 RANDOMLY_GENERATED_KEY2

-- Retrieving the top level results just requires the last key generated
SMEMBERS RANDOMLY_GENERATED_KEY3

When I encounter an AND I use SINTERSTORE based on the two child keys (and similarly for OR I use SUNIONSTORE). I randomly generate a key to store the results in (and set a short TTL so I don't fill Redis up with cruft). By the end of this series of commands the return value is a key that I can use to retrieve the results with SMEMBERS. The reason I've used the store functions is that I don't want to transport all the matching document references back to the server, so I use temporary keys to store the result on the Redis instance and then only bring back the matching results at the end.

My question is simply, is this the best way to make use of Redis as a document store?

回答1:

I'm using a similar approach with sorted sets to implement full text indexing. The overall approach is good, though there are a couple of fairly simple improvements you could make.

  • Rather than using randomly generated keys, you can use the query (or a short form thereof) as the key. That lets you reuse the sets that have already been calculated, which could significantly improve performance if you have queries across two large sets that are commonly combined in similar ways.
  • Handling title as a complete string will result in a very large number of single member sets. It may be better to index individual words in the title and filter the final results for an exact match if you really need it.