Microsoft.Office.Interop.Excel or EPPlus for read

2019-01-29 05:19发布

问题:

I wrote a code to read a column from a Excel file. I use Microsoft.Office.Interop.Excel on this, first read the entire Range and then write in System.Array after that I do some operations with the System.Array values and finally I convert it to List because I fill a ListBox element. This is the code (only relevant parts):

private List<string> bd = new List<string>();
private static System.Array objRowAValues;

private List<string> bl = new List<string>();
private static System.Array objRowBValues;

private List<string> cm = new List<string>();
private static System.Array objRowCValues;

private List<string> pl = new List<string>();
private List<string> bdCleanList;
private static Microsoft.Office.Interop.Excel.Application appExcel;

Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range rngARowLast, rngBRowLast, rngCRowLast;

long lastACell, lastBCell, lastCCell, fullRow;

private void btnCargarExcel_Click(object sender, EventArgs e)
    {
        if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
        {
            if (System.IO.File.Exists(openFileDialog1.FileName))
            {
                Stopwatch stopWatch = new Stopwatch();
                stopWatch.Start();
                Thread.Sleep(10000);

                filePath.Text = openFileDialog1.FileName.ToString();

                xlApp = new Microsoft.Office.Interop.Excel.Application();
                xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true,
                                                  Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false,
                                                  false, 0, true, 1, 0);
                xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                fullRow = xlWorkSheet.Rows.Count;
                lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
                rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
                objRowAValues = (System.Array)rngARowLast.Cells.Value;

                foreach (object elem in objRowAValues)
                {
                    if (elem != "")
                    {
                        bd.Add(cleanString(elem.ToString(), 10));
                    }
                }

                nrosProcesados.Text = bd.Count().ToString();
                listBox1.DataSource = bd;

                xlWorkBook.Close(true, null, null);
                xlApp.Quit();

                releaseObject(xlWorkSheet);
                releaseObject(xlWorkBook);
                releaseObject(xlApp);

                stopWatch.Stop();

                TimeSpan ts = stopWatch.Elapsed;
                executiontime.Text =
                    String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds,
                                  ts.Milliseconds / 10).ToString();
            }
            else
            {
                MessageBox.Show("No se pudo abrir el fichero!");
                System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
                appExcel = null;
                System.Windows.Forms.Application.Exit();
            }
        }
    }

I test with a Excel file with ~800 000 cells and take less than 2 minutes. Then I test samples from EPPlus and are faster than my approach so I think in use EPPlus instead of Microsoft.Office.Interop.Excel I think also in use OpenXML SDK (but can't find any example to suite my goals so I leave for now). In the example they use this code to read from a Excel file:

ExcelWorksheet sheet = package.Workbook.Worksheets[1];

var query1= (from cell in sheet.Cells["d:d"] where cell.Value is double && (double)cell.Value >= 9990 && (double)cell.Value <= 10000 select cell);

of course they use LINQ here, but my questions regarding this topic are:

  • Which approach did yours use?
  • What are your recommendations on this?
  • Any help to write the same using EPPlus or OpenXML SDK?

I'm newbie in C# world coming from PHP world and this is my first project

回答1:

Which approach did yours use? -EPPlus

What are your recommendations on this? -I've found EPPLus to be hugely faster. It is also an easier API to work with in my opinion. For many reasons, one being the lack of COM interop(both for speed and ease of use). Also has less requirements, especially when deploying to a server environment: no installing Excel junk.

Any help to write the same using EPPlus or OpenXML SDK? -EPPlus API is fairly straightfoward. Make an attempt and post more specific questions with what you've tried so far.

Another way to loop through cells:

var firstColumnRows = sheet.Cells["A2:A"];

// Loop through rows in the first column, get values based on offset
foreach (var cell in firstColumnRows)
{
    var column1CellValue = cell.GetValue<string>();
    var neighborCellValue = cell.Offset(0, 1).GetValue<string>();
}