Solr Index PDF documents and post them to a remote

2019-08-05 06:40发布

问题:

Hi I am a naive user when it come to Solr. Please guide me on the following hurdles.

1) Solr Index PDF documents

Solution tried

I used tika-app 0.9.jar to extract the content from the Input PDF files to text file. Now I am trying to write a java code to index the documents to Solr.

2) Post them to a remote server

I need to post either the documents or the index to a central remote server. Can curl command be used for this.

Regards Balaji.

回答1:

1) Solr Index PDF documents - I believe Solr does this for you. You can use Solr's http interface or SolrJ. 2) Post the index to a remote server - Solr replication may fit the bill.



回答2:

Assuming the PDFs are on a web server, you can use Nutch to fetch and parse them, and then push the index to Solr via its HTTP interface.