I implemented a simple adding tool for PDF documents. I first create a master document (SolrInputDocument) of all documents. It gets fields like author, filehash, keywords, 'content_type=document' and so on.
After that I generate a SolrInputDocument for every page, this object gets an id like 'parentID_p01', the page as a field value, 'content_type=page' usw.
Finally, I add all page documents to my master document with addChildDocument()..
Now my question is, how do I perform a search for a given word inside all pages of all documents for example like this:
Document1.pdf 'this is my doc1 title' [2 matches]
[Page 14] 'Example phrase for special a <em>word</em> given by....
[Page 55] 'another <em>word</em> for this test
Document2.pdf 'doc2 title' [X matches]
[Page 1] 'given <em>word</em> in this text
[Page 2] '<em>words</em> hit more than fists
[Page 99] 'some <em>words</em> of wisdom
My first idea was to simple search with 'text:word~' and then group by the parent document but I didn't find a good way :-(
It seems that nested documents are a little new to solr and I didn't find an easy solution with SolrJ.
thanks in advance
I have created data on solr in below format with parent child relation. Where one insurance_accounts have multiple person's vehicle insurance. One person can have multiple vehicles like car, bike etc. I have taken person as parent and vehicle as child document.
In below java code, I have used solrj 4.9 to create documents and run search query on SOLR. I have processed QueryResponse to show the required result and also given solr query URL.
You can take help from the given code snippet and let me know if it works or not.