How to easily process CSV file to List

2020-02-26 11:31发布

问题:

In my application I use a lot of CSV files which I have to read and build a lists based on them. I'd like to discover an easy way to do this. Do you know any easy framework which does it without using number of config files etc?

For instance, I have got a class Person:

public class Person {
    String name;
    String surname;

    double shoeSize;
    boolean sex; // true: male, false:female

    public Person() {
    }

    public String getName() {
            return name;
    }

    public void setName(String name) {
            this.name = name;
    }

    public String getSurname() {
            return surname;
    }

    public void setSurname(String surname) {
            this.surname = surname;
    }

    public double getShoeSize() {
            return shoeSize;
    }

    public void setShoeSize(double shoeSize) {
            this.shoeSize = shoeSize;
    }

    public boolean isSe) {
            return sex;
    }

    public void setSeboolean sex) {
            this.sex = sex;
    }

}

For this class, I have prepared CSV file:

name,surname,shoesize,sex
Tom,Tommy,32,true
Anna,Anny,27,false

How can I do it easily?

回答1:

There are lot of good frameworks written in Java to parse a CSV file and form a List of Objects. OpenCSV, JSefa & jCSV are to name a few of them.

For your requirement, I believe jCSV suits the best. Below is the sample code from jCSV which you can make use of easily.

Reader reader = new FileReader("persons.csv");

CSVReader<Person> csvPersonReader = ...;

// read all entries at once
List<Person> persons = csvPersonReader.readAll();

// read each entry individually
Iterator<Person> it = csvPersonReader.iterator();
while (it.hasNext()) {
  Person p = it.next();
  // ...
}

Moreover, parsing a CSV file and converting it to a List isn't a big deal and it can be achieved without using any framework, as shown below.

br = new BufferedReader(new FileReader(csvFileToRead));  
List<Person> personList = new ArrayList<>();
while ((line = br.readLine()) != null) {  
       // split on comma(',')  
       String[] personCsv = line.split(splitBy);  

       // create car object to store values  
       Person personObj = new Person();  

       // add values from csv to car object  
       personObj.setName(personCsv[0]);  
       personObj.setSurname(personCsv[1]);  
       personObj.setShoeSize(personCsv[2]);  
       personObj.setGender(personCsv[3]); 

       // adding car objects to a list  
       personList.add(personObj);         
} 

If the mapping of CSV columns to bean object is complex, repetitive or large in real case scenario, then it can be done easily by using DozerBeanMapper.

Hope this will help you.

Shishir



回答2:

One of the simplest ways to read and serialize data is by using the Jackson library. It also has an extension for CSV, you can find the wiki here

Let's say you have a Pojo like this:

@JsonPropertyOrder({ "name", "surname", "shoesize", "gender" })
public class Person {

    public String name;
    public String surname;
    public int shoesize;
    public String gender;

}

And a CSV like this:

Tom,Tommy,32,m
Anna,Anny,27,f

Then reading it is done like so:

MappingIterator<Person> personIter = new CsvMapper().readerWithTypedSchemaFor(Person.class).readValues(csvFile);
List<Person> people = personIter.readAll();

This is simple enough for my taste, basically all you need to do is add the column order in your CSV file using the @JsonPropertyOrder annotation and then just read the file using the above 2 lines.



回答3:

Not sure if you need to go as far as using an external library (and taking the usually implied performance hit). It's a pretty simple thing to implement. And if nothing else, it always helps to know what's going on behind the scenes in such a library:

public List<Person> readFile(String fileName) throws IOException {
    List<Person> result = new ArrayList<Person>();
    BufferedReader br = new BufferedReader(new FileReader(new File(fileName)));
    try {
        // Read first line
        String line = br.readLine();
        // Make sure file has correct headers
        if (line==null) throw new IllegalArgumentException("File is empty");
        if (!line.equals("name,surname,shoesize,sex"))
            throw new IllegalArgumentException("File has wrong columns: "+line);
        // Run through following lines
        while ((line = br.readLine()) != null) {
            // Break line into entries using comma
            String[] items = line.split(",");
            try {
                // If there are too many entries, throw a dummy exception, if
                // there are too few, the same exception will be thrown later
                if (items.length>4) throw new ArrayIndexOutOfBoundsException(); 
                // Convert data to person record
                Person person = new Person();
                person.setName    (                     items[0] );
                person.setSurname (                     items[1] );
                person.setShoeSize(Double .parseDouble (items[2]));
                person.setSex     (Boolean.parseBoolean(items[3]));
                result.add(person);
            } catch (ArrayIndexOutOfBoundsException|NumberFormatException|NullPointerException e) {
                // Caught errors indicate a problem with data format -> Print warning and continue
                System.out.println("Invalid line: "+ line);
            }
        }
        return result;
    } finally {
        br.close();
    }
}

Note that the catch statement uses Java 7 multi-catch. For older Java versions, either split it into 3 catch blocks or replace ArrayIndexOutOfBoundsException|NumberFormatException|NullPointerException with Exception. The latter is usually discouraged as it masks and ignores all other exceptions as well, but in a simple example like this, the risk is probably not too high.

This answer, unfortunately, is specific to your problem, but given that it is very straight forward, it should be easy to adapt to other situations as well...

Another neat thing you can do is to match line inside the while loop with a regular expression rather than simply splitting it based on a comma. That way you could also implement data validation in one shot (for example only match a sensible number for shoe size).

Note that the above implementation doesn't work if you have names that contain commas which are then enclosed in quotes (like "Jackson, Jr." as a last name). You can cover this case "easily" if you use regular expressions as described above, or by checking the first letter of the last name and if it is a quotation mark, combine item[1] with item[2] and use item[3] and item[4] instead for the shoesize and sex. This special case will likely be covered by most of the external libraries suggested here, so if you're not worried about any dependencies, licensing issues, and performance hits, those might be the easier way out...



回答4:

Use OpenCSV

Here is a complete example that reads entries and adds them to a List:

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.List;

import au.com.bytecode.opencsv.CSVReader;

public class CSVReaderImplementor {
  private String fileName;
  private CSVReader reader;
  private List<String[]> entries;

  public CSVReaderImplementor(String fileName) throws IOException, FileNotFoundException {
    this.fileName = fileName;
    reader = new CSVReader(new FileReader(this.fileName));

    entries = reader.readAll();

  }

  public List getEntries() {
    return entries;
  }

  public static void main(String[] args) throws FileNotFoundException, IOException {
    CSVReaderImplementor cri = new CSVReaderImplementor("yourfile.csv");

    for(int i = 0; i < 50; i++) {
      System.out.println(cri.getEntries().get(i).toString());
    }
  }
}

A List of type String[] is returned. You can iterate through the String array for each entry in the list and use the values at each index to populate your Bean constructor.



回答5:

opencsv is a good and simple solution. It is a small but powerful library. You can download it from the opencsv website (direct download from sourceforge, use the jar in the deploy directory) or use maven.

The java bean mapping feature makes it really simple because your CSV column names are matching the property names of your class (it ignores the different capitalisation).

How to use it:

Reader reader = // ... reader for the input file

// let it map the csv column headers to properties
CsvToBean<Person> csvPersons = new CsvToBean<Person>();
HeaderColumnNameMappingStrategy<Person> strategy = new HeaderColumnNameMappingStrategy<Person>();
strategy.setType(Person.class);

// parse the file and get a list of persons
List<Person> persons = csvPersons.parse(strategy, reader);

That's all.



回答6:

I think SuperCSV + Dozer easy to use and quite robust for java bean CSV serialization

http://supercsv.sourceforge.net/dozer.html