Open Excel, Parse Data?

2019-03-06 18:01发布

(Alternate Title: ReadAllLines Analogue for Excel?: What's the Best Way to Load and Manipulate Excel Data?)

I would like to quickly crack open an excel sheet and perform text manipulations. I want the operation to work like ReadAllLines (https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx) but for Excel.

I found the following question which is on point but seven years old. Reading Excel files from C# (Furthermore, it is an historically significant question that is frozen. Moreover, I do not have 50 points so I would not be able to comment were it open.) I cut and pasted Robin Robinson's answer into Visual Studio, changing only the path:

var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory());
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);

var adapter = new OleDbDataAdapter("SELECT * FROM [workSheetNameHere$]", connectionString);
var ds = new DataSet();

adapter.Fill(ds, "anyNameHere");

var data = ds.Tables["anyNameHere"].AsEnumerable();

var query = data.Where(x => x.Field<string>("id") != string.Empty).Select(x =>
            new 
            {
                id= x.Field<string>("id"),
            });

That fails on the Fill method with 'External table is not in the expected format.'

Are good programmers doing it this way today? i.e., should I investigate Jet and see if there are updates for the latest Excel, or is there a new and improved way?

标签: c# excel-2013
2条回答
神经病院院长
2楼-- · 2019-03-06 18:34

This is lightening fast; I can't tell I'm not doing text manipulations with ReadAllLines and Regex. (Removed the text manipulation details).

How I got it to work is explained in the comments to the original question.

Not sure why it's so much faster than Interop.Excel. Inefficient coding? More efficient API?

Any insight is appreciated!

        var path = string.Format(@"C:\Users\jlambert\Desktop\encryptedSSNs.xlsx");
        var connStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=Excel 12.0;";


        var adapter = new OleDbDataAdapter("SELECT * FROM [sheetName$]", connStr);
        var ds = new DataSet();

        adapter.Fill(ds, "anyNameHere");

        var data = ds.Tables["anyNameHere"].AsEnumerable();

        var query = data.Where(x => x.Field<string>("MRN") != string.Empty).Select(x =>
            new 
            {
                mrn = x.Field<string>("MRN"),
                ssn = x.Field<string>("ssn"),
            });

        foreach (var q in query)
        {
            Console.WriteLine(q);    
        }
        Console.ReadLine();
查看更多
Deceive 欺骗
3楼-- · 2019-03-06 18:59

I cobbled together a prototype based on Jon Skeet's recommendation to use Interop, and an answer by juliusz on How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel?

However, there are two things I really don't like about this: First, it seems I have to type in all the column names. Second, it loads the entire spreadsheet into memory rather than reading as a stream (takes a minute even for relatively small spreadsheets.

private static void Main(string[] args)
{
    var dataT = Import(@"C:\Users\jlambert\Desktop\dSmall_encrypted.xlsx");

    var data = dataT.AsEnumerable();
...
}
public static System.Data.DataTable Import(String path)
{
    var app = new Application();
    Workbook workBook = app.Workbooks.Open(path, 0, true, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);

    Worksheet workSheet = (Worksheet)workBook.ActiveSheet;

    int index = 0;
    object rowIndex = 2;

    System.Data.DataTable dt = new System.Data.DataTable();
    dt.Columns.Add("Facility_code");                    
    dt.Columns.Add("MRN");
    dt.Columns.Add("first_name");
    dt.Columns.Add("middle_name");
    dt.Columns.Add("last_name");
    dt.Columns.Add("address_line_1");
    dt.Columns.Add("address_line_2");
    dt.Columns.Add("city");
    dt.Columns.Add("state");
    dt.Columns.Add("zip");
    dt.Columns.Add("date_of_birth");
    dt.Columns.Add("gender");
    dt.Columns.Add("ssn");
    dt.Columns.Add("home_phone");
    dt.Columns.Add("work_phone");
    dt.Columns.Add("cell_phone");
    dt.Columns.Add("PCP");
    dt.Columns.Add("Practice Location");

    DataRow row;

    while (((Range)workSheet.Cells[rowIndex, 1]).Value2 != null)
    {
        rowIndex = 2 + index;
        row = dt.NewRow();
        row[0] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 1]).Value2);
        row[1] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 2]).Value2);
        row[2] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 3]).Value2);
        row[3] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 4]).Value2);
        row[4] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 5]).Value2);
        row[5] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 6]).Value2);
        row[6] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 7]).Value2);
        row[7] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 8]).Value2);
        row[8] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 9]).Value2);
        row[9] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 10]).Value2);
        row[10] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 11]).Value2);
        row[11] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 12]).Value2);
        row[12] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 13]).Value2);
        row[13] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 14]).Value2);
        row[14] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 15]).Value2);
        row[15] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 16]).Value2);
        row[16] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 17]).Value2);
        row[17] = Convert.ToString(((Range)workSheet.Cells[rowIndex, 18]).Value2);
        index++;
        dt.Rows.Add(row);
    }
    app.Workbooks.Close();
    return dt;
}
查看更多
登录 后发表回答