Large XML Parsing Efficiently

2019-08-09 02:03发布

I need to parse large XML Files and save data to MS SQL DB Tables. One way obviously to write C# Program. Obviously this raise a question of performance. Do you know any fastest and efficient way to process large scale XML?

2条回答
疯言疯语
2楼-- · 2019-08-09 02:48

The answer depends on the details of your scenario. How large is the XML file? Are you storing the entire XML file in the database, or just certain parts of it? Are you storing the XML as a blob in the database, or are you putting the different elements and attributes into their own dedicated columns?

C# will work fine for your needs, but there are different XML related APIs depending on your scenario.

If you want to deserialize the entire XML document into .NET objects, then you can define your objects in C# and use System.Xml.Serialization.XMLSerializer to load the document into memory.

However, if the document is really large, and you can't afford to load the whole thing into memory all at once, then probably you'll want to use System.Xml.XmlReader, which is a forward-only reader that you can use to grab elements and attributes one at a time, and shove them into your database.

查看更多
一夜七次
3楼-- · 2019-08-09 02:54

If you want to pursue a C# solution, look into XmlReader. This will give you forward only streaming access to your XML file. Note the forward only part. If you need to do more complex manipulations for child nodes, you'd probably do well to use a combination of XmlReader and XDocument, i.e. loading the large file with an XmlReader and then using ReadSubtree() to load subtrees into XDocuments. For example, if your document is something like:

<root>
    <big-child-1>
        <grandchild-a>
            ...
        </grandchild-a>
        <grandchild-b>
            ...
        </grandchild-b>
    </big-child-1>
    <big-child-2>
        ... 
    </big-child-2>
</root>

You might do something like this:

XmlReader xr = XmlReader.Create("C:\\file.xml");\
xr.MoveToContent();

while (xr.Read())
{
    if (xr.Name == "grandchild-a")
    {
        XDocument xd = new XDocument(xr.ReadSubTree()); // now you have an XDocument with all the content under the grandchild-a node
    }
    else if (xr.Name == ...)
}

However, the more you can just use XmlReader, the more performant it'll be.

Here's some documentation:

You do have other options of course:

  • SQL Server has XML functionality (look into OPENXML)
  • SSIS: you mention concerns about memory usage here, but it's an option.
  • XSLT: probably not as good an option as using XmlReader in this case, but you might be able to create XSLT that would then create a SQL query from your XML.
查看更多
登录 后发表回答