Convert a table with unknown structure into Key/Va

2019-08-29 05:16发布

问题:

The report data we receive from analysts come in Table format with arbitrary structure. All we know is that each row has a CustomerId column. But the others, we do not know and can vary every time.

The destination system that receives this data only does in Key/Value format so we have to convert the report tables into Key/Value.

So, if for instance, the source report table has the following structure:

CREATE TABLE [dbo].[SampleSourceTable](
    [CustomerId] [bigint] NULL,
    [Column1] [nchar](10) NULL,
    [Column2] [int] NULL,
    [Column3] [datetime] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[SampleSourceTable] ([CustomerId], [Column1], [Column2], [Column3]) VALUES (1, N'aaa', 123, CAST(N'2019-01-01T00:00:00.000' AS DateTime))
GO
INSERT [dbo].[SampleSourceTable] ([CustomerId], [Column1], [Column2], [Column3]) VALUES (2, N'bbb', 456, CAST(N'2018-01-01T00:00:00.000' AS DateTime))
GO

We would like this data to be converted into the following structure:

CREATE TABLE [dbo].[SampleDestinationTable](
    [CustomerId] [bigint] NULL,
    [Attribute] [nvarchar](255) NULL,
    [Value] [nvarchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column1', N'aaa')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column2', N'123')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column3', N'2019-01-01 00:00:00.000')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column1', N'bbb')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column2', N'456')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column3', N'2018-01-01 00:00:00.000')
GO

The challenge here, however, is that the source report table does not have a fixed structure.

At first, I thought about going through every row using a cursor and then using a nested cursor go through all the columns in that row. But apparently, there is no way of processing a row with an unknown structure using cursors. So for now, I am wondering if this is possible using PIVOT/UNPIVOT. But then again, I think they also require the column list.

I am running SQL Server 2017.

How do I do transform the data with an unknown structure?

回答1:

One possible approach is to generate a dynamic statement using information from INFORMATION_SCHEMA.COLUMNS:

-- Declarations
DECLARE @stm nvarchar(max)

-- Dynamic part 
SELECT 
    @stm = STUFF((
        SELECT CONCAT(
            N' UNION ALL SELECT CustomerID, ''', 
            [COLUMN_NAME],
            N''' AS [Attribute], CONVERT(nvarchar(max), ',
            QUOTENAME([COLUMN_NAME]),
            CASE 
                WHEN DATA_TYPE = 'datetime' THEN N', 121'
                -- Add additional conversion rules for other data types
                ELSE N''
            END,
            N') AS [Value]', 
            N' FROM [SampleSourceTable]'
        )
        FROM INFORMATION_SCHEMA.COLUMNS
        WHERE (TABLE_NAME = 'SampleSourceTable') AND (COLUMN_NAME <> 'CustomerId')
    FOR XML PATH('')
    ), 1, 11, N'')

-- Whole statement and execution
SET @stm = @stm + N'ORDER BY CustomerID'
PRINT @stm 
EXEC (@stm)

Output:

CustomerID  Attribute   Value
1           Column1     aaa       
1           Column2     123
1           Column3     2019-01-01 00:00:00.000
2           Column3     2018-01-01 00:00:00.000
2           Column2     456
2           Column1     bbb       


回答2:

Ah, you opened a second question, I just placed an answer at your first...

So I will use this place to provide the same technique as my other answer, but without any need of dynamically created SQL. Try this out:

DECLARE @xml XML =(SELECT TOP 10 o.object_id,o.* FROM sys.objects o FOR XML RAW, ELEMENTS XSINIL);

SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
        ,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
        ,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM @xml.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c);

The first column of the set will be returned as RowID. If this is not correct, you can force this by doing the same as I've done above to force the o.object_id in the first place. All Columns of your result will be returned as EAV.

Part of the result

+-------+---------------------+-------------------------+
| RowID | ColumnKey           | ColumnValue             |
+-------+---------------------+-------------------------+
| 3     | name                | sysrscols               |
+-------+---------------------+-------------------------+
| 3     | object_id           | 3                       |
+-------+---------------------+-------------------------+
| 3     | principal_id        | NULL                    |
+-------+---------------------+-------------------------+
| 3     | schema_id           | 4                       |
+-------+---------------------+-------------------------+
| 3     | parent_object_id    | 0                       |
+-------+---------------------+-------------------------+
| 3     | type                | S                       |
+-------+---------------------+-------------------------+
| 3     | type_desc           | SYSTEM_TABLE            |
+-------+---------------------+-------------------------+
| 3     | create_date         | 2017-08-22T19:38:02.860 |
+-------+---------------------+-------------------------+
| 3     | modify_date         | 2017-08-22T19:38:02.867 |
+-------+---------------------+-------------------------+
| 3     | is_ms_shipped       | 1                       |
+-------+---------------------+-------------------------+
| 3     | is_published        | 0                       |
+-------+---------------------+-------------------------+
| 3     | is_schema_published | 0                       |
+-------+---------------------+-------------------------+
| 5     | name                | sysrowsets              |
+-------+---------------------+-------------------------+
| ... more rows ...