Solr: transform a comma-delimited field during dat

2019-04-10 04:32发布

问题:

I'm working with Solr 3.5.0. I am importing from a JDBC data source and have a delimited field that I would like split into individual values. I'm using the RegexTransformer but my field isn't being split.

sample value

Bob,Carol,Ted,Alice

data-config.xml

<dataConfig>
  <dataSource driver="..." />
  <document>
    <entity name="ent"
            query="SELECT id,names FROM blah"
            transformer="RegexTransformer">
      <field column="id" />
      <field column="names" splitBy="," />
    </entity>
  </document>
</dataConfig>

schema.xml

<schema name="mytest" version="1.0">
  <types>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true"
               omitNorms="true"/>
    <fieldType name="integer" class="solr.IntField" omitNorms="true"/>
  </types>
  <fields>
    <field name="id" type="integer" indexed="false" stored="true"
           multiValued="false" required="true" />
    <field name="name" type="string" indexed="true" stored="true"
           multiValued="true" required="true" />
  </fields>
</schema>

When I search : I get a result doc element like this:

<doc>
  <int name="id">22</int>
  <arr name="names">
    <str>Bob,Carol,Ted,Alice</str>
  </arr>
</doc>

I was hoping to get this instead:

<doc>
  <int name="id">22</int>
  <arr name="names">
    <str>Bob</str>
    <str>Carol</str>
    <str>Ted</str>
    <str>Alice</str>
  </arr>
</doc>

It's quite possible I misunderstand the RegexTransformer section of the wiki. I've tried changing my delimiter and I've tried using a different field for the parts (as shown in the wiki)...

<field column="name" splitBy="," sourceColName="names" />

...but that resulted in an empty name field. What am I doing wrong?

回答1:

I handled a similar issue by creating a fieldtype in the schema file:

<fieldType name="commaDelimited" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.PatternTokenizerFactory" pattern=",\s*" />
      </analyzer>
</fieldType>

Then I applied that type to the field to the data field like:

<field name="features" type="commaDelimited" indexed="true" stored="true"/>


回答2:

Your database column is called names while the Solr field is called name (Notice the missing s). One solution is to use the following in your DIH config and then re-index.

<field name="name" column="names" splitBy=","/>


回答3:

Try putting transformer="RegexTransformer" before the query statement also you have an error

   transformer="RegexTransformer">

you need to remove the '>'



回答4:

You can use transformer="RegexTransformer" and also you can use javascript for splitting the value.

<script><![CDATA[

function stringtoarray(row) {
 var value=row.get('names');

 if(value !="" && value !=null) {   
   name_arr=value.split(",");
   row.put('name',name_arr);
   return row;
 }
}
]]>
</script>

and add the transformer="script:stringtoarray" to the entity field



标签: solr