Read Csv file encoding error

2019-04-30 19:52发布

I am using the following method for reading Csv file content:

    /// <summary>
    /// Reads data from a CSV file to a datatable
    /// </summary>
    /// <param name="filePath">Path to the CSV file</param>
    /// <returns>Datatable filled with data read from the CSV file</returns>
    public DataTable ReadCsv(string filePath)
    {
        if (string.IsNullOrEmpty(filePath))
        {
            log.Error("Invalid CSV file name.");
            return null;
        }

        try
        {
            DataTable dt = new DataTable();

            string folder = FileMngr.Instance.ExtractFileDir(filePath);
            string fileName = FileMngr.Instance.ExtractFileName(filePath);
            string connectionString = 
            string.Concat(@"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=",
            folder, ";");

            using (OdbcConnection conn = 
                   new System.Data.Odbc.OdbcConnection(connectionString))
            {
                string selectCommand = string.Concat("select * from [", fileName, "]");
                using (OdbcDataAdapter da = new OdbcDataAdapter(selectCommand, conn))
                {
                    da.Fill(dt);
                }
            }

            return dt;
        }
        catch (Exception ex)
        {
            log.Error("Error loading CSV content", ex);
            return null;
        }
    }

This method works if I have a UTF-8 encoded Csv file with a schema.ini that looks something like this:

[Example.csv]
Format=Delimited(,)
ColNameHeader=True
MaxScanRows=2
CharacterSet=ANSI

If I have German characters in a Csv file with Unicode encoding, the method cannot read the data correctly.

What modifications can I make to the above method to read Unicode Csv files? If there is no way to do it this way, what Csv-reading code can you suggest?

2条回答
我命由我不由天
2楼-- · 2019-04-30 20:06

Try using CharacterSet=UNICODE in your schema.ini file. Although this is not documented on MSDN it works according to this thread on Microsoft Forums.

查看更多
我命由我不由天
3楼-- · 2019-04-30 20:18

Well, a very good and well-used streaming CSV reader is on CodeProject; that is the first thing I'd try... but it sounds like your encoding may be borked, which might not make it simple... of course, it could just be odbc that is breaking, in which case the above might work fine.

For simple CSV you could try parsing it yourself (string.Split etc), but there are enough edge-cases that a pre-rolled parser is worth using.

查看更多
登录 后发表回答