Specifying multivalued term frequency in Solr upda

2019-08-05 08:26发布

问题:

I have a Solr schema containing a multivalued field. I'm parsing documents externally to Solr and updating the index using http://wiki.apache.org/solr/UpdateJSON (see also http://wiki.apache.org/solr/UpdateXmlMessages). Below is a toy example that demonstrates the problem I'm trying to solve.

{
  "add": {
    "doc": {
      "id": "MyDocumentID",
      "user": "MyUserID",
      "meals": ["pizza", "pizza", "pizza", "burger"]
    }
  }
}

I'm hoping to find some sort of syntax that will allow me to indicate that "pizza" occurs 3 times, without actually writing it out 3 times. The issue is that some of these frequencies could be in the thousands or tens of thousands. (I'm making use of the stored term frequencies for filtering and ranking search results.) Does such a syntax exist? I'm making this up, but here's an example of how imagine this might look.

{
  "add": {
    "doc": {
      "id": "MyDocumentID",
      "user": "MyUserID",
      "meals": ["pizza"*3, "burger"]
    }
  }
}

I suspect that the answer is that if I want behavior like this, I need to write some Solr code myself. I hope to avoid that, but if that is the case, you could still help me by pointing me to the right area of the code to work on.

Here is a related Lucene question: Can I insert a Document into Lucene without generating a TokenStream?

回答1:

If you are willing to parse the JSON to create an xml, there might be an workaround:

Instead of

 <add>
 <doc>
  <field name="employeeId">05991</field>
  <field name="skills" update="set">Python</field>
  <field name="skills" update="set">Python</field>
  <field name="skills" update="set">Python</field>
  <field name="skills" update="set">Java</field>
</doc>
</add>

You should be able to use this (note skills is a multivalued field):

<add>
 <doc>
  <field name="employeeId">05991</field>
  <field name="skills" update="set" boost="3.0">Python</field>
  <field name="skills" update="set">Java</field>
</doc>
</add>

This is from solr wiki.

Disclaimer: I have neither used multiple optional attributes in one field, and nor I seen any example doing this.



标签: solr