skip double quotes when reading csv file using apa

2019-08-03 09:31发布

问题:

Reader in = new FileReader(dataFile);
Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().withIgnoreEmptyLines(true).withTrim().parse(in);

        // Reads the data in csv file until last row is encountered
        for (CSVRecord record : records) {

            String column1= record.get("column1");

Here the column1 value in csv file is something like "1234557. So whe I read the column it is fetched with the quotes at the start. Is there any way in Apache commons csv to skip those.

Sample data from csv file:"""0996108562","""204979956"

回答1:

Unable to reproduce using commons-csv-1.4.jar with this MCVE (Minimal, Complete, and Verifiable example):

String input = "column1,column2\r\n" +
               "1,Foo\r\n" +
               "\"2\",\"Bar\"\r\n";
CSVFormat csvFormat = CSVFormat.RFC4180.withFirstRecordAsHeader()
                                       .withIgnoreEmptyLines(true)
                                       .withTrim();
try (CSVParser records = csvFormat.parse(new StringReader(input))) {
    for (CSVRecord record : records) {
        String column1 = record.get("column1");
        String column2 = record.get("column2");
        System.out.println(column1 + ": "+ column2);
    }
}

Output:

1: Foo
2: Bar

The quotes around "2" and "Bar" have been removed.



回答2:

If I correctly understand your requirement, you need to use unescapeCsv from Apache's StringEscapeUtils. As the doc says:

If the value is enclosed in double quotes, and contains a comma, newline >>or double quote, then quotes are removed.

Any double quote escaped characters (a pair of double quotes) are unescaped to just one double quote.

If the value is not enclosed in double quotes, or is and does not contain a comma, newline or double quote, then the String value is returned unchanged.