How best to use XPath with very large XML files in-第2页回答

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size.

I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block.

One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted.

Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc.

I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

标签： c# .net xml xpath large-files

10条回答

狗以群分

2楼-- · 2019-01-22 14:20

In order to perform XPath queries with the standard .NET classes the whole document tree needs to be loaded in memory which might not be a good idea if it can take up to a gigabyte. IMHO the XmlReader is a nice class for handling such tasks.

0人赞添加讨论(0) 举报

神经病院院长

3楼-- · 2019-01-22 14:23

http://msdn.microsoft.com/en-us/library/bb387013.aspx has a relevant example leveraging XStreamingElement.

0人赞添加讨论(0) 举报

forever°为你锁心

4楼-- · 2019-01-22 14:23

You've outlined your choices already.

Either you need to abandon the XPath and use XmlTextReader or you need to break the document up into managable chunks on which you can use XPath.

If you choose the latter use XPathDocument its readonly restriction allows better used of memory.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

5楼-- · 2019-01-22 14:27

Since in your case the data size can run in Gbs have you considered using ADO.NET with XML as a database. In addition to that the memory footprint would not be huge.

Another approach would be using Linq to XML with using elements like XElementStream. Hope this helps.

0人赞添加讨论(0) 举报

上一页 1 2

How best to use XPath with very large XML files in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间