Reading CSV file line by line and parsing it

2019-08-19 23:34发布

问题:

I have a CSV file that I need to read line by line with the help of a Scanner and store only country names into an array of strings. Here is my CSV file:

World Development Indicators
Number of countries,4
Country Name,2005,2006,2007
Bangladesh,6.28776238,13.20573922,23.46762823
"Bahamas,The",69.21279415,75.37855087,109.340767
Brazil,46.31418452,53.11025849,63.67475185
Germany,94.55486999,102.2828888,115.1403608

This is what I have so far:

public String[] getCountryNames() throws IOException, FileNotFoundException{
    String[] countryNames = new String[3];
    int index = 0;
    BufferedReader br = new BufferedReader(new FileReader(fileName));
    br.readLine();
    br.readLine();
    br.readLine();
    String line = br.readLine();
    while((br.readLine() != null) && !line.isEmpty()){
        String[] countries = line.split(",");
        countryNames[index] = countries[0];
        index++;
        line = br.readLine();
    }
    System.out.println(Arrays.toString(countryNames));
    return countryNames;
}

Output:

[Bangladesh, Brazil, null]

For some reason it skips "Bahamas, The" and can't read Germany. Please help me, I have been stuck on this method for hours already. Thanks for your time and effort. The return should be an array of Strings (country names).

回答1:

There are two issues with your code for parsing this CSV file. As a few folks have pointed out, you're calling readLine on your reader too many times, and discarding the output. Each time you read from the stream, you lose access to any data before the current read point. So reader.readLine() != null, for example, reads new data from the stream, checks that it isn't null, and then immediately gets rid of it since you haven't stored it in a variable. That's the main reason you're losing data while reading.

The second issue is your split condition. You're splitting on commas, which makes sense since this is a CSV file, but your data contains commas too (for example, "Bahamas, The"). You'll need a more specific split condition, as described in this post.

Here's an example of what this might look like (using a list for the countryNames instead of an array, because that's much easier to work with):

private static final String csv = "World Development Indicators\n"
    + "Number of countries,4\n"
    + "Country Name,2005,2006,2007\n"
    + "Bangladesh,6.28776238,13.20573922,23.46762823\n"
    + "\"Bahamas,The\",69.21279415,75.37855087,109.340767\n"
    + "Brazil,46.31418452,53.11025849,63.67475185\n"
    + "Germany,94.55486999,102.2828888,115.1403608\n";

public static String[] getCountryNames() throws Exception {
    List<String> countryNames = new ArrayList<>();

    //BufferedReader br = new BufferedReader(new FileReader(fileName));
    BufferedReader br = new BufferedReader(new StringReader(csv));
    br.readLine();
    br.readLine();
    br.readLine();

    String line = br.readLine();
    while (line != null && !line.isEmpty()) {
        String[] countries = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
        countryNames.add(countries[0]);
        line = br.readLine();
    }

    System.out.println(countryNames);
    return countryNames.toArray(new String[0]);
}


回答2:

It seems that you're reading too many lines, as seen below:

String line = br.readLine(); // Reads 1 line
while((br.readLine() != null) && !line.isEmpty()){ // Reads 1 line per iteration (and doesn't store it in a variable)
    String[] countries = line.split(",");
    countryNames[index] = countries[0];
    index++;
    line = br.readLine(); // Reads another line per iteration
}

The correct syntax for the while loop is:

String line;

while((line = br.readLine()) != null && !line.isEmpty() && index < countryNames.length) {
    String[] countries = line.split(",");
    countryNames[index++] = countries[0];
}

Notice how line is being assigned within the condition rather than within the loop body.