Solr Index PDF documents and post them to a remote

2019-08-05 06:35发布

Hi I am a naive user when it come to Solr. Please guide me on the following hurdles.

1) Solr Index PDF documents

Solution tried

I used tika-app 0.9.jar to extract the content from the Input PDF files to text file. Now I am trying to write a java code to index the documents to Solr.

2) Post them to a remote server

I need to post either the documents or the index to a central remote server. Can curl command be used for this.

Regards Balaji.

2条回答
Viruses.
2楼-- · 2019-08-05 06:48

1) Solr Index PDF documents - I believe Solr does this for you. You can use Solr's http interface or SolrJ. 2) Post the index to a remote server - Solr replication may fit the bill.

查看更多
趁早两清
3楼-- · 2019-08-05 06:51

Assuming the PDFs are on a web server, you can use Nutch to fetch and parse them, and then push the index to Solr via its HTTP interface.

查看更多
登录 后发表回答