In MongoDB, given a find() operator that returns a cursor for a set of rows, what is an idiomatic and time-efficient manner in which to return "context" rows, i.e. rows sequentially before and/or after each row in the set?
For me the easiest way to explain this concept is using ack, which supports context searching. Given a file:
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
This is the output from ack:
C:\temp>ack.pl -C 2 "line 4" test.txt
line 2
line 3
line 4
line 5
line 6
I am storing log data in a MongoDB collection, one document per row. Each log each tokenized into keywords and these keywords are indexed, which gives me cheap-ish full-text searching.
I execute a bog-standard:
collection.find({keywords: {'$all': ['key1', 'key2']}}, {}).sort({datetime: -1});
and get a cursor. At this stage, without adding any additional fields, what is the approach for getting context? I think the flow is something like:
- For each row in the cursor:
- Get the _id field, store into x.
- execute: collection.find({_id: {'$gt': x}}).limit(N)
- Get the results from each of these cursors.
- execute: collection.find({_id: {'$lt': x}}).sort({_id: 1}).limit(N)
- Get the results from each of these cursors.
For a result set with R rows this requires 2R+1 queries.
However, I think I can trade off space for time. Is a feasible alternative to update each row with its context _id's in the background? For a given row that currently has fields:
_id, contents, keywords
I would add an additional field:
_id, contents, keywords, context_ids
and then in a subsequent search I could, somehow, use these context_ids, I think? I'm not at all familiar with MongoDB MapReduce yet, but can that come into the picture as well?
I think the most direct approach is to store the full-text of the actual context rows in each row, but this seems a bit crude to me. The clear advantage is that a single query could return the context I need.
I appreciate any and all answers that accept the scope of the question. I realise I could use Lucene or a real full-text search engine out-of-band but I'm trying to feel out the edges and capabilities of MongoDB so I'd appreciate MongoDB-specific answers. Thanks!