C# Script in SSIS Script Task to convert Excel Col

2020-04-12 08:23发布

I am exporting Data from SQL Server to Excel, utilizing SSIS Data Flow Task. Here all columns appear as Text despite export formatting. Hence I need to develop a SSIS Script task to do the necessary conversion. I am facing trouble in developing the script.

Excel Workbook before Formatting

Excel Workbook Before Formatting See, the Excel Cell has no Apostrophe and the Number type is also "General" but the message says The number in this cell is formatted as text or preceded by an apostrophe

I have Tried different options available in the internet, but unsuccessfully.

#region Namespaces
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Runtime.InteropServices;
using Excel = Microsoft.Office.Interop.Excel;
#endregion

namespace ST_de899f405b7b4083b0ad8cba6b3df2e3
{

[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
    public void Main()
    {
        string inputFile = (string)Dts.Variables["Target_FullFilePath"].Value;
        Excel.Application ExcelApp = new Excel.Application();
        Excel.Workbook ExcelWorkbook = ExcelApp.Workbooks.Open(inputFile);
        Excel.Range formatRange;
        ExcelApp.Visible = true;

        foreach (Excel.Worksheet ExcelWorksheet in ExcelWorkbook.Sheets)
        {
            ExcelWorksheet.Select(Type.Missing);
            ExcelWorksheet.Columns[2].NumberFormat = "";
            ExcelWorksheet.Columns[3].NumberFormat = "";
            ExcelWorksheet.Columns[4].NumberFormat = "0.00000";
            ExcelWorksheet.Columns[5].NumberFormat = "yyyy-MM-dd";
        }

        ExcelWorkbook.Save();

        GC.Collect();
        GC.WaitForPendingFinalizers();

        ExcelWorkbook.Close(Type.Missing, Type.Missing, Type.Missing);
        Marshal.FinalReleaseComObject(ExcelWorkbook);

        ExcelApp.Quit();
        Marshal.FinalReleaseComObject(ExcelApp);
    }   
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion
}
}

Expected Result: Columns numbered B, C, D to look like decimal/integer numbers and also similarly filtered. Column E to look like Date and also similarly filtered.

This is how I want Excel file to look like, after formatting through SSIS

What I want through Script in SSIS

I confirm the corresponding columns have relevant values only except column header.

1条回答
在下西门庆
2楼-- · 2020-04-12 08:47

Before providing the solution, i have to explain some points about Excel Number Format

What is Number Format property?

Referring to Number format codes documentation:

You can use number formats to change the appearance of numbers, including dates and times, without changing the actual number. The number format does not affect the cell value that Excel uses to perform calculations. The actual value is displayed in the formula bar.

What is General Number format?

Referring to Reset a number to the General format documentation:

The General format is the default number format that Excel applies when you type a number. For the most part, numbers that are formatted with the General format are displayed just the way that you type them.

How Date are stored in Excel?

Referring to How Dates Work in Excel:

The dates in Excel are actually stored as numbers, and then formatted to display the date.

Your excepted result

You mentioned that:

Expected Result: Columns numbered 16, 17, 22 to be converted to "General" and look like decimal numbers. Column 31 to be converted to "General" and look like Date.

Based on what we mentioned you cannot convert column 31 to "General" and make it look like Date.

Solution

You just need to set NumberFormat property to an empty string to set it as "General"

ExcelWorksheet.Columns[16].NumberFormat = "";

Experiments

I Created an Excel file with 4 columns: NumberColumn, DateColumn, DecimalColumn and StringColumn as shown in the image above:

enter image description here

I created a console application with the following code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using Excel = Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string inputFile = @"D:\Test.xlsx";

            Excel.Application ExcelApp = new Excel.Application();
            Excel.Workbook ExcelWorkbook = ExcelApp.Workbooks.Open(inputFile);
            ExcelApp.Visible = true;

            foreach (Excel.Worksheet ExcelWorksheet in ExcelWorkbook.Sheets)
            {
                ExcelWorksheet.Select(Type.Missing);

                ExcelWorksheet.Columns[1].NumberFormat = "";
                ExcelWorksheet.Columns[2].NumberFormat = "yyyy-MM-dd"; // convert format to date
                ExcelWorksheet.Columns[2].NumberFormat = "";
                ExcelWorksheet.Columns[3].NumberFormat = "0.00000"; // convert format to decimal with 5 decimal digits
                ExcelWorksheet.Columns[3].NumberFormat = "";
                ExcelWorksheet.Columns[4].NumberFormat = "";


            }
            ExcelWorkbook.Save();

            GC.Collect();
            GC.WaitForPendingFinalizers();

            ExcelWorkbook.Close(Type.Missing, Type.Missing, Type.Missing);
            Marshal.FinalReleaseComObject(ExcelWorkbook);

            ExcelApp.Quit();
            Marshal.FinalReleaseComObject(ExcelApp);
        }
    }
}

After executing the application, the Excel looked like the following:

enter image description here

Discussion and Conclusion

From the image above, we can see that all columns are changed to General Number format, but if values are stored as numbers they will be shown as they are stored: Date values are shown as Excel serials (numbers), decimal values are shown with only one decimal digit, even if we changed the format to five digits before resetting the format to General.

In Brief, you cannot handle how the values are shown when the Number Format is "General", if you need to show values as dates you have to set the number format to yyyy-MM-dd or any other date format.

Reference


Update 1

Instead of using ExcelWorksheet.Columns[1].NumberFormat, try using the following code:

ExcelWorksheet.Cells[1,1].EntireColumn.NumberFormat = "";
ExcelWorksheet.Cells[1,2].EntireColumn.NumberFormat = "";
查看更多
登录 后发表回答