-->

Bulk filter & parse ONIX data (XML) in Excel

2019-08-22 02:15发布

问题:

I have 3000 ONIX-style xml URLs with book metadata. I would like to filter a few nodes out of these files and parse them into Excel rows in a single worksheet.

This is an example of a XML-path: http://btsoep.appspot.com/rest/book/9789082516425

I would like to filter this XML data on things as

<Product>
 <Title>
  <TitleText>

and

<Product>
 <Contributor>
  <PersonName>

I need the retrieved Data in an Excel worksheet where every URL has it's own row. So the URLs are in row A, TitleText is in row B and PersonName is in row C.

How would I go by doing this?

EDIT 1:

What I've tried so far is: -using wget to download all the xml data first and then try to bulk parse this is excel. This could work but is unnecessary. -using the default XMLimport function in Excel. I can't seem to run this in a batch.

I don't have any experience with filtering XML files and parsing this in an Excel worksheet. I'm not asking anyone to fix this for me or write code for me, but I would like a step in the good direction. Which tools would be the best to use for this case? Thanks again.

回答1:

this should get you started

Option Explicit

Sub parseONIX()

    Dim URL As String

    URL = "http://btsoep.appspot.com/rest/book/9789082516425"
 '  URL = "https://www.w3schools.com/xml/plant_catalog.xml"

    Dim XMLPage As New MSXML2.XMLHTTP60
    XMLPage.Open "GET", URL, False
    XMLPage.send

    Dim XMLDoc As New MSXML2.DOMDocument
    XMLDoc.LoadXML XMLPage.responseText

    Debug.Print XMLDoc.ChildNodes(0).BaseName
    Debug.Print XMLDoc.ChildNodes(1).BaseName
    Debug.Print XMLDoc.ChildNodes(1).ChildNodes(0).BaseName
    Debug.Print XMLDoc.ChildNodes(1).ChildNodes(1).BaseName
    Debug.Print XMLDoc.getElementsByTagName("Product").Item(0).BaseName

    Dim i As Integer

    For i = 0 To XMLDoc.getElementsByTagName("Measure").Length - 1

        Debug.Print "type: "; XMLDoc.getElementsByTagName("Measure")(i).ChildNodes(0).Text,
        Debug.Print XMLDoc.getElementsByTagName("Measure")(i).ChildNodes(1).Text,
        Debug.Print XMLDoc.getElementsByTagName("Measure")(i).ChildNodes(2).Text

    Next i



End Sub