I have a large XML file that needs parsed in VBA (excel 2003 & 2007). There could be upwards of 11,000 'rows' of data in the xml file with each 'row' having between 10 and 20 'columns'. This ends up being a huge task just to parse through and grab the data (5 - 7 minutes). I tried reading the xml and placing each 'row' into a dictionary (key = row number, value = Row Attributes), but this takes just as long.
It is taking forever to traverse the DOM. Is there a more efficient way?
Dim XMLDict
Sub ParseXML(ByRef RootNode As IXMLDOMNode)
Dim Counter As Long
Dim RowList As IXMLDOMNodeList
Dim ColumnList As IXMLDOMNodeList
Dim RowNode As IXMLDOMNode
Dim ColumnNode As IXMLDOMNode
Counter = 1
Set RowList = RootNode.SelectNodes("Row")
For Each RowNode In RowList
Set ColumnList = RowNode.SelectNodes("Col")
Dim NodeValues As String
For Each ColumnNode In ColumnList
NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
Next ColumnNode
XMLDICT.Add Counter, NodeValues
Counter = Counter + 1
Next RowNode
End Sub
Use the
SelectSingleNode
function. This will let you search for a node based on pattern matching.For instance, I created the following function:
Now, if I have the following XML file:
I can simply call:
and it will jump through to the first key called 'Error Status' at any depth, and pull out the text in the 'Source' node - returning "INTEGRATION"
You could try using SAX instead of DOM. SAX should be faster when all you are doing is parsing the document and the document is non-trivial in size. The reference for the SAX2 implementation in MSXML is here
I typically reach straight for the DOM for most XML parsing in Excel but SAX seems to have advantages in some situations. The short comparison here might help to explain the differences between them.
Here's a hacked-together example (partially based on this) just using
Debug.Print
for output:Add a reference to "Microsoft XML, v6.0" via Tools > References
Add this code in a normal module
Add a class module, call it
ContentHandlerImpl
and add the following codeUse the left-hand drop-down at the top of the module to choose "IVBSAXContentHandler" and then use the right-hand drop-down to add stubs for each event in turn (from
characters
tostartPrefixMapping
)Add code to some of the stubs as follows
Explicitly set up the counter and the flag to show if we want to read text data at this time
Every time a new element starts, check the name of the element and take appropriate action
Check to see if we are interested in the text data and, if we are, chop off any extraneous white space and remove all line feeds (this may or may not be desirable depending on the document you are trying to parse)
If we have reached the end of a
Col
then stop reading the text values; if we have reached the end of aRow
then print out the string of node valuesTo make things clearer, here is the full version of
ContentHandlerImpl
with al of the stub methods in place: