C# extract formatted text from Excel through OLEDB

2020-04-20 12:08发布

问题:

I am reading data from a large excel file which has got formatted texts. I extract the data to DataTable object through oleDBConnection & GetOleDbSchemaTable. But the extracted data doesn't include any formatted information.

My requirement is, I need to extract only non-Strikethrough texts.

I don't have any issues while reading and everything is perfectly fine. But my extraction should be based on text format in excel which I am unable to find the solution.

Anything to be added in the connectionString which is as follows:

connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + "; Extended Properties=\"Excel 12.0 Xml;HDR=YES;Mode=Read;ReadOnly=True;\"";

Constraints:

  1. Using OleDBConnection - No Interop

  2. Reading to DataTable Object

回答1:

To the best of my knowledge, you cannot do this with just the ACE driver. That being said, if the constraints are ever lifted, this is the tool to use: http://closedxml.codeplex.com/



回答2:

You should switch your approach to use open XML. It's .NET, not COM

Info: http://support.microsoft.com/kb/840817

Download SDK: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=5124



回答3:

You have to use "Open Xml" to get to format information if you do not want to use Interop. There is no other choice. There is Biff which I mention only for correctness, but your connection string clearly indicates XML, so Biff is out.

Once you open through "Open Xml" then you can "simulate" reading the range/table/cells like a DataTable.

With your constraints, Only #1 will give you formatting information.