Solr DIH regextransformer - processes only one CSV

2019-04-16 21:57发布

问题:

Hi I have the following CSV file

132 1536130302256087040
133 1536130302256087041
134 1536130302256087042

the fields are seperated by a tab. Now I have the Dataimporthandler (DIH) for the solr, and I try to import the CSV into solr, but I only get the first line into solr. Thats the result, but the other lines from the CSV are missing:

  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 1,
    "docs": [ {
        "string": "1536130302256087040",
        "id": "132",
        "_version_": 1536202153221161000
      } ] }

Here is my data-config.xml

<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" name="fds"/>
    <document>

     <entity name="f" 
     processor="FileListEntityProcessor" 
     fileName="myfile.csv" 
     baseDir="/var/www/solr-5.4.0/server/csv/files" 
     recursive="false" 
     rootEntity="true" 
     dataSource="null" >

     <entity 
     onError="continue" 
     name="jc"   
     processor="LineEntityProcessor" 
     url="${f.fileAbsolutePath}" 
     dataSource="fds"  
     rootEntity="true" 
     header="false"
     separator="\t"
     transformer="RegexTransformer" >

     <field column="id" name="id" sourceColName="rawLine" regex="^(.*)\t"/>
     <field column="string" name="string" sourceColName="rawLine" regex="\t(.*)$"/>

             </entity>            
        </entity>
    </document>
</dataConfig>

Here is my schema.xml

<field name="id" type="text_general" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="string" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true"/>

 <uniqueKey>id</uniqueKey>

What I'm doing wrong?

回答1:

You have rootEntity=true for both levels of entities. So, you will only get one document for the outer entity. Try setting the outer level rootEntity to false.

Also, you can just send tab-separated files to the Solr with CSV processor, no DIH required.