I need a workaround for Excel Guessing Data Types

2019-01-12 01:30发布

问题:

I'm creating a utility to import data from Excel to Oracle database,

I have a fixed template for the excel file,

Now, when I'm trying to import the data by Jet provider and ADO.Net - Ole connection tools, I found the following problem: there're some columns haven't been imported because there are mixed data types in their columns [string and number],

I looked for this problem on the internet I found the reason is guessing data types from Excel

The load code:

connection = new OleDbConnection(@"Provider=Microsoft.Jet.OLEDB.4.0; Data Source={0};Extended Properties=Excel 8.0;");
string columns = "P_ID, FULL_NAME_AR, job_no, GENDER, BIRTH_DATE, RELIGION, MARITAL_STATUS, NAT_ID, JOB_Name, FIRST_HIRE_DATE, HIRE_DATE, CONTRACT_TYPE, GRADE_CODE, QUALIFICATION";
string sheetName = "[Emps$]";
OleDbCommand command = new OleDbCommand(string.Format("select {0} from {1} where p_id is not null", columns, sheetName), connection);

connection.Open();
dr = command.ExecuteReader();
DataTable table = new DataTable();
table.Load(dr);

What should I do to tell Excel STOP GUESSING and give me the data as Text ?

if there isn't, can you help me with any workarounds ?

Thanks in advance

回答1:

I found a solution by adding IMEX=1 for the connection string, but there's a special format for it which descriped in the following link.

The IMEX parameter is for columns that use mixed numeric and alpha values. The Excel driver will typically scan the first several rows in order to determine what data type to use for each column. If a column is determined to be numeric based upon a scan of the first several rows, then any rows with alpha characters in this column will be returned as Null. The IMEX parameter (1 is input mode) forces the data type of the column to text so that alphanumeric values are handled properly.

Regards



回答2:

Can you work from the excel end? This example run in Excel will put mixed data tyoes into an SQL Server table:

Dim cn As New ADODB.Connection

scn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" _
& sFullName _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"

cn.Open scn

s = "SELECT Col1, Col2, Col3 INTO [ODBC;Description=TEST;DRIVER=SQL Server;" _
& "SERVER=Some\Instance;Trusted_Connection=Yes;" _
& "DATABASE=test].TableZ  FROM [Sheet1$]"
cn.Execute s


回答3:

This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1, and always uses a numeric type if the first 8 rows are numbers (again, regardless of IMEX=1). Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:

try
{
    Console.Write(wsReader.GetDouble(j).ToString());
}
catch   //Lame unfixable bug
{
    Console.Write(wsReader.GetString(j));
}



回答4:

An alternative solution is to add or change the setting TypeGuessRows in the registry. By setting its value to 0, the complete document will be scanned.

Unfortunately, the settings may be found on various locations in the registry, depending on the which libraries and versions of them you have installed.

For instance: [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel] "TypeGuessRows"=dword:00000000

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel] "TypeGuessRows"=dword:00000000

This will also prevent truncation of textual data longer than 255 characters. This happens if you have a number for TypeGuessRows larger than 0 and the first text longer than 255 characters occurs beyond that number.

See also Setting TypeGuessRows for excel ACE Driver.