I wrote a code to read a column from a Excel file. I use Microsoft.Office.Interop.Excel on this, first read the entire Range and then write in System.Array after that I do some operations with the System.Array values and finally I convert it to List because I fill a ListBox element. This is the code (only relevant parts):
private List<string> bd = new List<string>();
private static System.Array objRowAValues;
private List<string> bl = new List<string>();
private static System.Array objRowBValues;
private List<string> cm = new List<string>();
private static System.Array objRowCValues;
private List<string> pl = new List<string>();
private List<string> bdCleanList;
private static Microsoft.Office.Interop.Excel.Application appExcel;
Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range rngARowLast, rngBRowLast, rngCRowLast;
long lastACell, lastBCell, lastCCell, fullRow;
private void btnCargarExcel_Click(object sender, EventArgs e)
{
if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
{
if (System.IO.File.Exists(openFileDialog1.FileName))
{
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Thread.Sleep(10000);
filePath.Text = openFileDialog1.FileName.ToString();
xlApp = new Microsoft.Office.Interop.Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true,
Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false,
false, 0, true, 1, 0);
xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);
fullRow = xlWorkSheet.Rows.Count;
lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
objRowAValues = (System.Array)rngARowLast.Cells.Value;
foreach (object elem in objRowAValues)
{
if (elem != "")
{
bd.Add(cleanString(elem.ToString(), 10));
}
}
nrosProcesados.Text = bd.Count().ToString();
listBox1.DataSource = bd;
xlWorkBook.Close(true, null, null);
xlApp.Quit();
releaseObject(xlWorkSheet);
releaseObject(xlWorkBook);
releaseObject(xlApp);
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
executiontime.Text =
String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10).ToString();
}
else
{
MessageBox.Show("No se pudo abrir el fichero!");
System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
appExcel = null;
System.Windows.Forms.Application.Exit();
}
}
}
I test with a Excel file with ~800 000 cells and take less than 2 minutes. Then I test samples from EPPlus and are faster than my approach so I think in use EPPlus instead of Microsoft.Office.Interop.Excel I think also in use OpenXML SDK (but can't find any example to suite my goals so I leave for now). In the example they use this code to read from a Excel file:
ExcelWorksheet sheet = package.Workbook.Worksheets[1];
var query1= (from cell in sheet.Cells["d:d"] where cell.Value is double && (double)cell.Value >= 9990 && (double)cell.Value <= 10000 select cell);
of course they use LINQ here, but my questions regarding this topic are:
- Which approach did yours use?
- What are your recommendations on this?
- Any help to write the same using EPPlus or OpenXML SDK?
I'm newbie in C# world coming from PHP world and this is my first project