I am new to solr. I have couple of questions on solr Indexing and searching:
- Can I configure to index two tables( no relationship 1. books and 2. computers and both are in the same datasource) if i want to have two search boxes. Is it possible to do something like defining two entities in one data-config.xml
If yes please let me know the steps.
I guess we can do using two different data-config.xml files. But need to know how to configure in schema.xml and corresponding changes.
- How to configure solr to index both PDF files and Mysql on one solr instance.
Please help me out and let know if there are any reference documents.
2 different tables no relation
data-config.xml:
<document>
<entity name="topic" transformer="TemplateTransformer" pk="topic_id" query="select topic_id,topic_title,creation_date,updation_date,vote_count,.....">
<field column=" doc_id " template="TOPIC_${topic.topic_id} " />
<field column="doc_type " template="TOPIC " />
</entity>
<entity name="product " transformer="TemplateTransformer " pk="product_id " query="SELECT product_id,..... ">
<field column="doc_id " template="PRODUCT_${product.product_id} " />
<field column="doc_type " template="PRODUCT " />
<field column="product_supplier_id " name="product_supplier_id " />
<field column="supplier_product_code " name="supplier_product_code " />
<field column="product_display_name " name="product_display_name " />
</entity>
</document>
schema.xml:
<schema>
. . .
<fields>
<field name="doc_id" type="string" />
<field name="doc_type" type="string" />
<field name="catchall" type="string" stored="false" omitNorms="true" multiValued="true" />
<field name="topic_title" type="text_general" />. . . .
</fields>
<uniqueKey>doc_id</uniqueKey>
<copyField source="*" dest="catchall" />
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>catchall</defaultSearchField>
</schema>
more info -http://www.lucidimagination.com/blog/2011/02/12/solr-powered-isfdb-part-4/
no above field should be required or may create problem while indexing
you can query on browser like http://localhost:8080/solr/select/?q=*:*&fq=doc_type:PRODUCT
You can easily accomplish this with Solr, just take a good read at the DataImportHandler:
http://wiki.apache.org/solr/DataImportHandler
and to this example:
http://wiki.apache.org/solr/MultipleIndexes
Then do some googleing around for specific examples about entities. Basically, your data-config.xml should like something like this (not tested):
<entity name="books" transformer="TemplateTransformer" dataSource="myindex"
query="SELECT * FROM t_books";>
<field column="category" template="books" name="category"/></entity>
<entity name="computers"
dataSource="myindex"
query="SELECT * FROM t_computers">
<field column="category" template="computers" name="category"/></entity>
Use the template to separate the two entities and define the category field as a string in your schema.xml. Also, make sure you pay attention to how you set the unique id parameter, some info for this specific topic is here:
http://lucene.472066.n3.nabble.com/Indexing-multiple-entities-td504464.html
and also check here:
http://search.lucidimagination.com/search/document/f84c3abf7e859be1/dataimporthanlder_multiple_entities_will_step_into_each_other
With this approach you have the two sets of data in the same index, in case you want them to work for two separate search boxes, you could simply run your searches like:
myquery AND category:(books) <--- this would only get you the results from the books or this other one would only get you the computers results---> myquery AND category:(computers).
Hope it helps.
And for your pdf questions, I believe you have to use Apache's Tika module, I won't be of much help in here as i haven't used it myself, but here's the link:
http://wiki.apache.org/solr/TikaEntityProcessor