What is the best practice when importing and validating an XML file to a single table (flattened) in SQL Server ?
I've a XML file which contains about 15 complex types which are all related to a single parent element.
The SSIS design could look like this:
But it's getting very complicated with all those (15) joins.
Is it maybe a better idea to just write T-SQL code to :
1) Import the XML into a column which is of the type XML and is linked to a XSD-schema.
2) Use this code:
TRUNCATE TABLE XML_Import
INSERT INTO XML_Import(ImportDateTime, XmlData)
SELECT GETDATE(), XmlData
FROM
(
SELECT *
FROM OPENROWSET (BULK 'c:\XML-Data.xml', SINGLE_BLOB) AS XMLDATA
) AS FileImport (XMLDATA)
delete from dbo.UserFlat
INSERT INTO dbo.UserFlat
SELECT
user.value('(UserIdentifier)', 'varchar(8)') as UserIdentifier,
user.value('(Emailaddress)', 'varchar(70)') as Emailaddress,
businessaddress.value('(Fax)', 'varchar(70)') as Fax,
employment.value('(EmploymentData)', 'varchar(8)') as EmploymentData,
-- More values here ...
FROM
XML_Import CROSS APPLY
XmlData.nodes('//user') AS User(user) CROSS APPLY
user.nodes('BusinessAddress') AS BusinessAddress(businessaddress) CROSS APPLY
user.nodes('Employment') AS Employment(employment)
-- More 'joins' here ...
to fill the 'UserFlat' table ?
Some disadvantages are that you have to manually type the SQLcode, but the advantage here is that I have more direct control how the elements are processed and converted. But I don't know if there are any performance differences between processing XML in SSIS and processing the XML with T-SQL XML statements.
Note that some other requirements are:
- Error handling : in case of an error, an email must be send to a person.
- Able to process multiple input files with a specific file name pattern : XML_{date}_{time}.xml
- Move the processed XML files to a different folder.
Please advice.
Based on the requirements that you have mentioned, I would say that you can use best of both the worlds (T-SQL & SSIS).
I feel that T-SQL gives more flexibility in loading the XML data that you have described in the question.
There are lot of different ways you can achieve this. Here is one possible option:
Create a Stored Procedure that would take the path of the XML file as input parameter.
Perform your XML data load operation using the T-SQL way which you feel is easier.
Use SSIS package to perform error handling, file processing, archiving and send email.
Use logging feature available in SSIS. It just requires simple configuration. Here is a samples that show how to configure logging in SSIS How to track status of rows successfully processed or failed in SSIS data flow task?
A sample mock up of your flow would be as shown below in the screenshot. Loop the files using Foreach Loop container. Pass the file path as parameter to Execute SQL Task, which in turn would call the T-SQL that you had mentioned. After processing the file, using the File System Task to move the file to an archive folder.
Sample used in SSIS reading multiple xml files from folder
shows how to loop through files using Foreach loop container. It loops through xml files but uses Data Flow Task because the xml files are in simpler format.
Sample used in How to send the records from a table in an e-mail body using SSIS package? shows how to send e-mail using Send Mail Task.
Sample used in How do I move files to an archive folder after the files have been processed? shows how to move files to an Archive folder.
Sample used in Branching after a file system task in SSIS without failing the package shows how to continue package execution even after a particular task fails. This will help you to proceed with package execution even if Foreach Loop fails so you can send email. Blue arrow in the screenshot indicates on completion of previous task.
Sample used in How do I pick the most recently created folder using Foreach loop container in SSIS package? shows how to perform pattern matching.
Hope that gives you an idea.