The report data we receive from analysts come in Table format with arbitrary structure. All we know is that each row has a CustomerId
column. But the others, we do not know and can vary every time.
The destination system that receives this data only does in Key/Value format so we have to convert the report tables into Key/Value.
So, if for instance, the source report table has the following structure:
CREATE TABLE [dbo].[SampleSourceTable](
[CustomerId] [bigint] NULL,
[Column1] [nchar](10) NULL,
[Column2] [int] NULL,
[Column3] [datetime] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[SampleSourceTable] ([CustomerId], [Column1], [Column2], [Column3]) VALUES (1, N'aaa', 123, CAST(N'2019-01-01T00:00:00.000' AS DateTime))
GO
INSERT [dbo].[SampleSourceTable] ([CustomerId], [Column1], [Column2], [Column3]) VALUES (2, N'bbb', 456, CAST(N'2018-01-01T00:00:00.000' AS DateTime))
GO
We would like this data to be converted into the following structure:
CREATE TABLE [dbo].[SampleDestinationTable](
[CustomerId] [bigint] NULL,
[Attribute] [nvarchar](255) NULL,
[Value] [nvarchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column1', N'aaa')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column2', N'123')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (1, N'Column3', N'2019-01-01 00:00:00.000')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column1', N'bbb')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column2', N'456')
GO
INSERT [dbo].[SampleDestinationTable] ([CustomerId], [Attribute], [Value]) VALUES (2, N'Column3', N'2018-01-01 00:00:00.000')
GO
The challenge here, however, is that the source report table does not have a fixed structure.
At first, I thought about going through every row using a cursor and then using a nested cursor go through all the columns in that row. But apparently, there is no way of processing a row with an unknown structure using cursors. So for now, I am wondering if this is possible using PIVOT/UNPIVOT. But then again, I think they also require the column list.
I am running SQL Server 2017.
How do I do transform the data with an unknown structure?
One possible approach is to generate a dynamic statement using information from INFORMATION_SCHEMA.COLUMNS
:
-- Declarations
DECLARE @stm nvarchar(max)
-- Dynamic part
SELECT
@stm = STUFF((
SELECT CONCAT(
N' UNION ALL SELECT CustomerID, ''',
[COLUMN_NAME],
N''' AS [Attribute], CONVERT(nvarchar(max), ',
QUOTENAME([COLUMN_NAME]),
CASE
WHEN DATA_TYPE = 'datetime' THEN N', 121'
-- Add additional conversion rules for other data types
ELSE N''
END,
N') AS [Value]',
N' FROM [SampleSourceTable]'
)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE (TABLE_NAME = 'SampleSourceTable') AND (COLUMN_NAME <> 'CustomerId')
FOR XML PATH('')
), 1, 11, N'')
-- Whole statement and execution
SET @stm = @stm + N'ORDER BY CustomerID'
PRINT @stm
EXEC (@stm)
Output:
CustomerID Attribute Value
1 Column1 aaa
1 Column2 123
1 Column3 2019-01-01 00:00:00.000
2 Column3 2018-01-01 00:00:00.000
2 Column2 456
2 Column1 bbb
Ah, you opened a second question, I just placed an answer at your first...
So I will use this place to provide the same technique as my other answer, but without any need of dynamically created SQL. Try this out:
DECLARE @xml XML =(SELECT TOP 10 o.object_id,o.* FROM sys.objects o FOR XML RAW, ELEMENTS XSINIL);
SELECT r.value('*[1]/text()[1]','nvarchar(max)') AS RowID
,c.value('local-name(.)','nvarchar(max)') AS ColumnKey
,c.value('text()[1]','nvarchar(max)') AS ColumnValue
FROM @xml.nodes('/row') A(r)
CROSS APPLY A.r.nodes('*[position()>1]') B(c);
The first column of the set will be returned as RowID. If this is not correct, you can force this by doing the same as I've done above to force the o.object_id
in the first place. All Columns of your result will be returned as EAV.
Part of the result
+-------+---------------------+-------------------------+
| RowID | ColumnKey | ColumnValue |
+-------+---------------------+-------------------------+
| 3 | name | sysrscols |
+-------+---------------------+-------------------------+
| 3 | object_id | 3 |
+-------+---------------------+-------------------------+
| 3 | principal_id | NULL |
+-------+---------------------+-------------------------+
| 3 | schema_id | 4 |
+-------+---------------------+-------------------------+
| 3 | parent_object_id | 0 |
+-------+---------------------+-------------------------+
| 3 | type | S |
+-------+---------------------+-------------------------+
| 3 | type_desc | SYSTEM_TABLE |
+-------+---------------------+-------------------------+
| 3 | create_date | 2017-08-22T19:38:02.860 |
+-------+---------------------+-------------------------+
| 3 | modify_date | 2017-08-22T19:38:02.867 |
+-------+---------------------+-------------------------+
| 3 | is_ms_shipped | 1 |
+-------+---------------------+-------------------------+
| 3 | is_published | 0 |
+-------+---------------------+-------------------------+
| 3 | is_schema_published | 0 |
+-------+---------------------+-------------------------+
| 5 | name | sysrowsets |
+-------+---------------------+-------------------------+
| ... more rows ...