Concatenate data from two columns of a CSV file us

2019-07-31 02:45发布

问题:

Is there a way to create a new column in a csv file which includes the concatenation of two other columns joined with a "-" - Using Ant?

example:

customer,deal,NEWFIELD
200000042,23,200000042-23
200000042,34,200000042-34
200000042,35,200000042-35    
200000042,65,200000042-65

回答1:

Would it be simpler to embedd a scripting language like Groovy?

Example

├── build.xml
├── src
│   └── file1.csv
└── target
    └── file1.csv

src/file1.csv

customer,deal
200000042,23
200000042,34
200000042,35
200000042,65

target/file1.csv

customer,deal,customer-deal
200000042,23,200000042-23
200000042,34,200000042-34
200000042,35,200000042-35
200000042,65,200000042-65

build.xml

<project name="demo" default="build">

  <available classname="org.codehaus.groovy.ant.Groovy" property="groovy.installed"/>

  <target name="build" depends="install-groovy">
    <taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy"/>

    <groovy>
      ant.mkdir(dir:"target")

      new File("target/file1.csv").withWriter {
        new File("src/file1.csv").splitEachLine(",") { customer, deal ->
           it.println "${customer},${deal},${customer}-${deal}"
        }
      }
    </groovy>
  </target>

  <target name="install-groovy" description="Install groovy" unless="groovy.installed">
    <mkdir dir="${user.home}/.ant/lib"/>
    <get dest="${user.home}/.ant/lib/groovy.jar" src="http://search.maven.org/remotecontent?filepath=org/codehaus/groovy/groovy-all/2.4.7/groovy-all-2.4.7.jar"/>
    <fail message="Groovy has been installed. Run the build again"/>
  </target>

</project>


回答2:

You can do this using Ant filterchains, something like this basic example:

<property name="in.file" value="input.txt" />
<property name="out.file" value="output.txt" />
<property name="new.field" value="NEWFIELD" />
<property name="sep.char" value="," />

<loadfile srcfile="${in.file}" property="file.head">
  <filterchain>
    <headfilter lines="1" />
    <striplinebreaks />
  </filterchain>
</loadfile>
<loadfile srcfile="${in.file}" property="file.body">
  <filterchain>
    <headfilter skip="1" />
    <tokenfilter>
        <replaceregex pattern="^([^${sep.char}]*)${sep.char}([^${sep.char}]*)$"
                      replace="\1${sep.char}\2${sep.char}\1-\2" />
    </tokenfilter>
  </filterchain>
</loadfile>

<echo file="${out.file}">${file.head}${sep.char}${new.field}
${file.body}</echo>

Two <loadfile> tasks are used to process the header and body of the file, then a simple <echo> task to write the output. A simplistic regular expression works here as the number of fields in the CSV file is small. The replaceregex uses capturing groups to get the first two fields on the line, then in the replace string assembles the required output.

If there are several fields, then perhaps a scriptfilter in the second loadfile would be simper to work with:

<loadfile srcfile="${in.file}" property="file.body">
  <filterchain>
    <headfilter skip="1" />
    <scriptfilter language="javascript"><![CDATA[
      var line = self.getToken( );
      var fields = line.split( "," );
      self.setToken( line + "," + fields[0] + "-" + fields[1] );
    ]]></scriptfilter>
  </filterchain>
</loadfile>

This one takes the line, splits it and then appends the required fields.

Neither example here would work if your data contains embedded commas.



标签: csv ant