Importing Excel Data Seems to Randomly Give Null V

2020-04-17 05:47发布

问题:

Using SSIS for Visual Studio 2017 for some excel file imports.

I've created a package with several loop containers that call to specific packages to handle some files. I have an issue with one particular package being executed in that it seemingly randomly decides the data for columns is NULL per excel file. I was/am under the impression that this is part of the registry setting for TypeGuessRows (changed initially to 0 then to 1000 as a test) located at

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel

The reason I think this is because the various files being brought in generally have the same data, but it seems that if the first few rows of columns in the source data contains only numbers, that the data with mixed values will not be brought in correctly. All other columns aside from this seems fine.

Looking at the source files, all have the same datatype. I've tried changing the registry TypeGuessRows value and ensured that the output column property was string-based instead of numerical. The connection string has IMEX=1

回答1:

This issue is related to the OLEDB provider used to read excel files: Since excel is not a database where each column has a specific data type, OLEDB provider tries to identify the dominant data types found in each column and replace all other data types that cannot be parsed with NULLs.

There are many articles found online discussing this issue and giving several workarounds (links listed below).

But after using SSIS for years, i can say that best practice is to convert excel files to csv files and read them using Flat File components.

Or, if you don't have the choice to convert excel to flat files then you can force excel connection manager to ignore headers from the first row bu adding HDR=NO to the connection string and adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time), in this case all columns are imported as string and no values are replaced with NULLs but you will lose the headers and a additional row (header row is imported).

If you cannot ignore the header row, just add a dummy row that contains dummy string values (example: aaa) after the header row and add IMEX=1 to the connection string.

Helpful links

  • SSIS Excel Data Import - Mixed data type in Rows
  • Mixed data types in Excel column
  • Importing data from Excel having Mixed Data Types in a column (SSIS)
  • Why SSIS always gets Excel data types wrong, and how to fix it!
  • EXCEL IN SSIS: FIXING THE WRONG DATA TYPES
  • IMEX= 1 extended properties in ssis


回答2:

So I fixed it. Or at least found a sufficient workaround that should help anyone in my situation. I think it has to do with the cache of SSIS. I ended up putting a sort function on the problem column so the records getting read as NULL for having a random data type are read first, and not being considered random. I will say, I tried this initially and it didn't work. Through a little experiment of making a new data flow in the same package I discovered that this solution actually does work, hence me thinking the cache was the issue. If anyone has any further questions on this, let me know.