I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015
, however starting in 2016, the structure changes to look like this: '1/4/2016
(note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row
value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row
variable, which is a LinqToExcel.Row
instance and contains the raw data from Excel. The values inside row
all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader
to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutor
class that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader
to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.