Does ADO.NET + massive INSERTs + Excel + C# = “A b

2019-05-31 23:27发布

Basically I need to insert a bunch of data to an Excel file. Creating an OleDB connection appears to be the fastest way but I've seen to have run into memory issues. The memory used by the process seems to keep growing as I execute INSERT queries. I've narrowed them down to only happen when I output to the Excel file (the memory holds steady without the output to Excel). I close and reopen the connection in between each worksheet, but this doesn't seem to have an effect on the memory usage (as so does Dispose()). The data is written successfully as I can verify with relatively small data sets. If anyone has insight, it would be appreciated.

initializeADOConn() is called in the constructor

initADOConnInsertComm() creates the insert parameterized insert query

writeRecord() is called whenever a new record is written. New worksheets are created as needed.

public bool initializeADOConn()
        {
            /* Set up the connection string and connect.*/
            string connectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;" +
                "Data Source=" + this.destination + ";Extended Properties=\"Excel 8.0;HDR=YES;\"";
            //DbProviderFactory factory =
                //DbProviderFactories.GetFactory("System.Data.OleDb");
            conn = new OleDbConnection(connectionString);
            conn.ConnectionString = connectionString;
            conn.Open();

            /* Intialize the insert command. */
            initADOConnInsertComm();
            return true;
        }
    public override bool writeRecord(FileListerFileInfo file)
            {
                /* If all available sheets are full, make a new one. */
                if (numWritten % EXCEL_MAX_ROWS == 0)
                {
                    conn.Close();
                    conn.Open();
                    createNextSheet();
                }
                /* Count this record as written. */
                numWritten++;
                /* Get all of the properties of the FileListerFileInfo record and add
                 * them to the parameters of the insert query. */
                PropertyInfo[] properties = typeof(FileListerFileInfo).GetProperties();
                for (int i = 0; i < insertComm.Parameters.Count; i++)
                    insertComm.Parameters[i].Value = properties[i].GetValue(file, null);
                /* Add the record. */
                insertComm.ExecuteNonQuery();

                return true;
            }

EDIT:

No, I do not use Excel at all. I'm intentionally avoiding Interop.Excel due to its poor performance (at least from my dabbles with it).

3条回答
该账号已被封号
2楼-- · 2019-05-31 23:39

The answer is Yes, the formula you describe does equal a bad time.

If you have a database handy (SQL Server or Access are good for this), you can do all of your inserts into a database table, and then export the table all at once into an Excel spreadsheet.

Generally speaking, databases are good at handling lots of inserts, while spreadsheets aren't.

查看更多
成全新的幸福
3楼-- · 2019-05-31 23:50

Here are a couple of ideas:

Is the target workbook open? There is a bug (Memory leak occurs when you query an open Excel worksheet by using ActiveX Data Objects) which IIRC is actually in the OLE DB provider for Jet (which you are using) although this isn't confirmed in the above article.

Regardless, bulk insert would seem to be the way to go.

You could use the same Jet OLE DB provider to do this: all you need is a one row table. You could even fabricate one on the fly. To create a new Excel workbook, execute CREATE TABLE DDL using a non-existent xls file in the connection string and the provider will create the workbook for you with a worksheet to represent the table. You have a connection to your Excel workbook so you could execute this:

CREATE TABLE [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable 
(
   x FLOAT
);

(Even better IMO would be to fabricate a Jet database i.e. .mdb file).

Use INSERT to create a dummy row:

INSERT INTO [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable (x) 
   VALUES (0);

Then, still using your connection to your target workbook, you could use something similar to the following to create a derived table (DT1) of your values to INSERT in one hit:

INSERT INTO MyExcelTable (key_col, data_col)
SELECT DT1.key_col, DT1.data_col
FROM (
   SELECT 22 AS key_col, 'abc' AS data_col
   FROM [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable
   UNION ALL
   SELECT 55 AS key_col, 'xyz' AS data_col
   FROM [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable
   UNION ALL
   SELECT 99 AS key_col, 'efg' AS data_col
   FROM [EXCEL 8.0;DATABASE=C:\MyFabricatedWorkbook;HDR=YES].OneRowTable
) AS DT1;
查看更多
萌系小妹纸
4楼-- · 2019-05-31 23:53

Instead of writing one record at a time, can you find a way to insert in a Bulk capacity? I try not to use crazy DataSet stuff, but isn't there a way to make all your inserts happen local first and then make them go up in one fell swoop? Does this processes open up Excel in the background? Do these processes die afterwards?

查看更多
登录 后发表回答