LinqToExcel Not Parsing Date

2019-06-13 17:12发布

问题:

I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.

Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.

Edit

Here is the code I am using to read in the file:

var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
    data.Add(this.FillEntity(row));
}

The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.

** Edit 2 **

I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the LinqToExcel.ExcelQueryExecutorclass that is failing:

private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
    {
        var results = new List<object>();
        var columnIndexMapping = new Dictionary<string, int>();
        for (var i = 0; i < columns.Count(); i++)
            columnIndexMapping[columns.ElementAt(i)] = i;

        while (data.Read())
        {
            IList<Cell> cells = new List<Cell>();
            for (var i = 0; i < columns.Count(); i++)
            {
                var value = data[i];

                //I added this in, since the worksheet has over 37K rows and 
                //I needed to snag right before it hit the values I was looking for
                //to see what the IDataReader was exposing. The row inside the
                //IDataReader relevant to the column I'm referencing is null,
                //even though the data definitely exists in the Excel file
                if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
                {
                }



                value = TrimStringValue(value);
                cells.Add(new Cell(value));
            }
            results.CallMethod("Add", new Row(cells, columnIndexMapping));
        }
        return results.AsEnumerable();
    }

Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.

回答1:

Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.

To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.

Source: What is IMEX in the OLEDB connection string?