I'm very new to Solr and I'm evaluating it. My task is to look for words within a corpus of books and return them within a small context. So far, I'm storing the books in a database split by paragraphs (slicing the books by line breaks), I do a fulltext search and return the row.
In Solr, would I have to do the same, or can I add the whole book (in .txt format) and, whenever a match is found, return something like the match plus 100 words before and 100 words after or something like that? Thanks
Highlighting will do your bidding. http://wiki.apache.org/solr/HighlightingParameters
Here are relevant options for you:
For what you describe, set it to return 5 (or whatever a human can sanely handle) snippets from
text
field withhl.fl
; the length of each snippet 400 characters (my approximation of 100 words) around the word/phrase.See also
hl.regex.slop
for building snippets around phrases andhl.simple.pre/hl.simple.post
for markup.