C# Script in SSIS Script Task to convert Excel Col

2020-04-12 08:35发布

问题:

I am exporting Data from SQL Server to Excel, utilizing SSIS Data Flow Task. Here all columns appear as Text despite export formatting. Hence I need to develop a SSIS Script task to do the necessary conversion. I am facing trouble in developing the script.

Excel Workbook before Formatting

See, the Excel Cell has no Apostrophe and the Number type is also "General" but the message says The number in this cell is formatted as text or preceded by an apostrophe

I have Tried different options available in the internet, but unsuccessfully.

#region Namespaces
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Runtime.InteropServices;
using Excel = Microsoft.Office.Interop.Excel;
#endregion

namespace ST_de899f405b7b4083b0ad8cba6b3df2e3
{

[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
    public void Main()
    {
        string inputFile = (string)Dts.Variables["Target_FullFilePath"].Value;
        Excel.Application ExcelApp = new Excel.Application();
        Excel.Workbook ExcelWorkbook = ExcelApp.Workbooks.Open(inputFile);
        Excel.Range formatRange;
        ExcelApp.Visible = true;

        foreach (Excel.Worksheet ExcelWorksheet in ExcelWorkbook.Sheets)
        {
            ExcelWorksheet.Select(Type.Missing);
            ExcelWorksheet.Columns[2].NumberFormat = "";
            ExcelWorksheet.Columns[3].NumberFormat = "";
            ExcelWorksheet.Columns[4].NumberFormat = "0.00000";
            ExcelWorksheet.Columns[5].NumberFormat = "yyyy-MM-dd";
        }

        ExcelWorkbook.Save();

        GC.Collect();
        GC.WaitForPendingFinalizers();

        ExcelWorkbook.Close(Type.Missing, Type.Missing, Type.Missing);
        Marshal.FinalReleaseComObject(ExcelWorkbook);

        ExcelApp.Quit();
        Marshal.FinalReleaseComObject(ExcelApp);
    }   
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion
}
}

Expected Result: Columns numbered B, C, D to look like decimal/integer numbers and also similarly filtered. Column E to look like Date and also similarly filtered.

This is how I want Excel file to look like, after formatting through SSIS

I confirm the corresponding columns have relevant values only except column header.

回答1:

Before providing the solution, i have to explain some points about Excel Number Format

What is Number Format property?

Referring to Number format codes documentation:

You can use number formats to change the appearance of numbers, including dates and times, without changing the actual number. The number format does not affect the cell value that Excel uses to perform calculations. The actual value is displayed in the formula bar.

What is General Number format?

Referring to Reset a number to the General format documentation:

The General format is the default number format that Excel applies when you type a number. For the most part, numbers that are formatted with the General format are displayed just the way that you type them.

How Date are stored in Excel?

Referring to How Dates Work in Excel:

The dates in Excel are actually stored as numbers, and then formatted to display the date.

Your excepted result

You mentioned that:

Expected Result: Columns numbered 16, 17, 22 to be converted to "General" and look like decimal numbers. Column 31 to be converted to "General" and look like Date.

Based on what we mentioned you cannot convert column 31 to "General" and make it look like Date.

Solution

You just need to set NumberFormat property to an empty string to set it as "General"

ExcelWorksheet.Columns[16].NumberFormat = "";

Experiments

I Created an Excel file with 4 columns: NumberColumn, DateColumn, DecimalColumn and StringColumn as shown in the image above:

I created a console application with the following code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using Excel = Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string inputFile = @"D:\Test.xlsx";

            Excel.Application ExcelApp = new Excel.Application();
            Excel.Workbook ExcelWorkbook = ExcelApp.Workbooks.Open(inputFile);
            ExcelApp.Visible = true;

            foreach (Excel.Worksheet ExcelWorksheet in ExcelWorkbook.Sheets)
            {
                ExcelWorksheet.Select(Type.Missing);

                ExcelWorksheet.Columns[1].NumberFormat = "";
                ExcelWorksheet.Columns[2].NumberFormat = "yyyy-MM-dd"; // convert format to date
                ExcelWorksheet.Columns[2].NumberFormat = "";
                ExcelWorksheet.Columns[3].NumberFormat = "0.00000"; // convert format to decimal with 5 decimal digits
                ExcelWorksheet.Columns[3].NumberFormat = "";
                ExcelWorksheet.Columns[4].NumberFormat = "";


            }
            ExcelWorkbook.Save();

            GC.Collect();
            GC.WaitForPendingFinalizers();

            ExcelWorkbook.Close(Type.Missing, Type.Missing, Type.Missing);
            Marshal.FinalReleaseComObject(ExcelWorkbook);

            ExcelApp.Quit();
            Marshal.FinalReleaseComObject(ExcelApp);
        }
    }
}

After executing the application, the Excel looked like the following:

Discussion and Conclusion

From the image above, we can see that all columns are changed to General Number format, but if values are stored as numbers they will be shown as they are stored: Date values are shown as Excel serials (numbers), decimal values are shown with only one decimal digit, even if we changed the format to five digits before resetting the format to General.

In Brief, you cannot handle how the values are shown when the Number Format is "General", if you need to show values as dates you have to set the number format to yyyy-MM-dd or any other date format.

Reference

  • C# Accessing EXCEL, formatting cell as General

Update 1

Instead of using ExcelWorksheet.Columns[1].NumberFormat, try using the following code:

ExcelWorksheet.Cells[1,1].EntireColumn.NumberFormat = "";
ExcelWorksheet.Cells[1,2].EntireColumn.NumberFormat = "";