I have a Solr schema containing a multivalued field. I'm parsing documents externally to Solr and updating the index using http://wiki.apache.org/solr/UpdateJSON (see also http://wiki.apache.org/solr/UpdateXmlMessages). Below is a toy example that demonstrates the problem I'm trying to solve.
{
"add": {
"doc": {
"id": "MyDocumentID",
"user": "MyUserID",
"meals": ["pizza", "pizza", "pizza", "burger"]
}
}
}
I'm hoping to find some sort of syntax that will allow me to indicate that "pizza" occurs 3 times, without actually writing it out 3 times. The issue is that some of these frequencies could be in the thousands or tens of thousands. (I'm making use of the stored term frequencies for filtering and ranking search results.) Does such a syntax exist? I'm making this up, but here's an example of how imagine this might look.
{
"add": {
"doc": {
"id": "MyDocumentID",
"user": "MyUserID",
"meals": ["pizza"*3, "burger"]
}
}
}
I suspect that the answer is that if I want behavior like this, I need to write some Solr code myself. I hope to avoid that, but if that is the case, you could still help me by pointing me to the right area of the code to work on.
Here is a related Lucene question: Can I insert a Document into Lucene without generating a TokenStream?