Querying binary fields in Solr

2019-07-13 12:16发布

I'm using Solr to index records consisting of binary fields. I've specified the fields in schema.xml as such:

<field name="id" type="binary" indexed="true" stored="true" required="true" multiValued="false" />

I'm able to add records to the index via a POST request, encoding and sending the fields as Base64 Strings. The size of the collection's data directory is growing so I know it is storing something; however, when doing a match all query (q=*:*) I strangely get some documents found but none returned, e.g.:

"response": {  
  "numFound": 364047,
  "start": 0,
  "maxScore": 1,
  "docs": []
}

Has anybody any idea what's causing this or how it can be resolved?
Thanks

标签: solr solr4
1条回答
干净又极端
2楼-- · 2019-07-13 13:15

Short answer it cannot be solved.

When having a read in the reference documentation of Solr, you find there very few information about the BinaryField type

Class: BinaryField

Description: Binary data.

The current state is that this BinaryField is only intended for storage of binary data. Nothing more, nothing less. There is however an issue to change this, but it has not raised that much attention yet.

My personal assumption is that behind this lies the fact that binary data is just not plain and simple binary data. Most of the time it is an elaborated file format that requires special interpretation. For this task a separate Apache Project exists, Apache Tika.

To tame this beast several good articles and tutorials are spread all over the web. A good starting point how to integrate this with Solr is also found in the reference documentation (1, 2).

查看更多
登录 后发表回答