I have crawled websites using Nutch and I have pushed crawled data to solr. Now I want to search content between specific tag with specific attribute value. For example,
<h><title> title to search </title></h>
<div id="abc">
content to search
</div>
<div class="efg">
other content to search
</div>
I have seen this question(how to parse html with nutch and index specific tag to solr?) but this does not have enough clarity.
I want to know that whether there is any plugin available or i need to write a customized plugin altogether. If i have to write a plugin, i just need few directions for handling html tags and attributes.