Using this command
curl '://localhost:8983/solr/update/extract?literal.id=doc1&commit=true' -F "myfile=@maven_tutorial.pdf"
we can index single pdf files,by specifying our own id(DOC1), in solr. But I want to index many pdf files to solr all at once. let solr keep track of id automatically.
Please help me.
You can use UUID type field as unique key.
First define the UUID field type
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
Add your id field in the schema.xml
<field name="id" type="uuid" indexed="true" stored="true" multiValued="false"/>
Make this field as the unique key
<uniqueKey>id</uniqueKey>
In solrconfig.xml update the chain for autogenerating the id
<updateRequestProcessorChain name="uuid">
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Now attach this update chain to the request handler which is extracting the content from the pdf files that you are submitting to solr.
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>