(Alternate Title: ReadAllLines Analogue for Excel?: What's the Best Way to Load and Manipulate Excel Data?)
I would like to quickly crack open an excel sheet and perform text manipulations. I want the operation to work like ReadAllLines (https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx) but for Excel.
I found the following question which is on point but seven years old. Reading Excel files from C# (Furthermore, it is an historically significant question that is frozen. Moreover, I do not have 50 points so I would not be able to comment were it open.) I cut and pasted Robin Robinson's answer into Visual Studio, changing only the path:
var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory());
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var adapter = new OleDbDataAdapter("SELECT * FROM [workSheetNameHere$]", connectionString);
var ds = new DataSet();
adapter.Fill(ds, "anyNameHere");
var data = ds.Tables["anyNameHere"].AsEnumerable();
var query = data.Where(x => x.Field<string>("id") != string.Empty).Select(x =>
new
{
id= x.Field<string>("id"),
});
That fails on the Fill method with 'External table is not in the expected format.'
Are good programmers doing it this way today? i.e., should I investigate Jet and see if there are updates for the latest Excel, or is there a new and improved way?
This is lightening fast; I can't tell I'm not doing text manipulations with ReadAllLines and Regex. (Removed the text manipulation details).
How I got it to work is explained in the comments to the original question.
Not sure why it's so much faster than Interop.Excel. Inefficient coding? More efficient API?
Any insight is appreciated!
I cobbled together a prototype based on Jon Skeet's recommendation to use Interop, and an answer by juliusz on How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel?
However, there are two things I really don't like about this: First, it seems I have to type in all the column names. Second, it loads the entire spreadsheet into memory rather than reading as a stream (takes a minute even for relatively small spreadsheets.