parse an XML with R

2019-08-06 13:08发布

问题:

I'm starting a project in R language and I have to parse an XML, I'm using the XML library and functions xmlToDataFrame, XMLPARSE, etc.. I want to store the information in a structured way on a dataframe but I've encountered a problem. I can not get variables to take within a node separately, each in its appropriate column. By using the above-mentioned functions, it saves all the data of the variables in the dataframe a single cell in a single line.

The XML I use is as follows:

<?xml version="1.0" encoding="UTF-8"?>
-<rest-response>

<type>rest-response</type>

<time-stamp>1392217780000</time-stamp>

<status>OK</status>

<msg-version>1.0.0</msg-version>

<op>inventory</op>


-<response>

<inventorySize>3</inventorySize>

<inventoryMode>SYNCHRONOUS</inventoryMode>

<time>4952</time>


-<items>


-<item>

<epc>00000000000000000000A195</epc>

<ts>1392217779060</ts>

<location-id>adtr</location-id>

<location-pos>0,0,0</location-pos>

<device-id>adtr@1</device-id>

<device-reader>192.168.1.224</device-reader>

<device-readerPort>1</device-readerPort>

<device-readerMuxPort>0</device-readerMuxPort>

<device-readerMuxPort2>0</device-readerMuxPort2>

<tag-rssi>-49.0</tag-rssi>

<tag-readcount>36.0</tag-readcount>

<tag-phase>168.0</tag-phase>

</item>


-<item>

<epc>00000000000000000000A263</epc>

<ts>1392217779065</ts>

<location-id>adtr</location-id>

<location-pos>0,0,0</location-pos>

<device-id>adtr@1</device-id>

<device-reader>192.168.1.224</device-reader>

<device-readerPort>1</device-readerPort>

<device-readerMuxPort>0</device-readerMuxPort>

<device-readerMuxPort2>0</device-readerMuxPort2>

<tag-rssi>-49.0</tag-rssi>

<tag-readcount>36.0</tag-readcount>

<tag-phase>0.0</tag-phase>

</item>


-<item>

<epc>B00000000000001101080802</epc>

<ts>1392217779323</ts>

<location-id>adtr</location-id>

<location-pos>0,0,0</location-pos>

<device-id>adtr@1</device-id>

<device-reader>192.168.1.224</device-reader>

<device-readerPort>1</device-readerPort>

<device-readerMuxPort>0</device-readerMuxPort>

<device-readerMuxPort2>0</device-readerMuxPort2>

<tag-rssi>-72.0</tag-rssi>

<tag-readcount>27.0</tag-readcount>

<tag-phase>157.0</tag-phase>

</item>

</items>

</response>

</rest-response>

Everything is inside item gets it as a single value, and I want to put asunder by different concepts.

Another important point is that the XML may change, but its structure will always be the same, but there may be more items

Any idea?

回答1:

So I assume to want the <items> in a data frame. Assuming your xml is in the variable xml.text, this will work:

library(XML)
xml   <- xmlInternalTreeParse(xml.text)  # assumes your xml in variable xml.text
items <- getNodeSet(xml,"//items/item")
df    <- xmlToDataFrame(items)
df
#                        epc            ts location-id location-pos device-id device-reader device-readerPort device-readerMuxPort device-readerMuxPort2 tag-rssi tag-readcount tag-phase
# 1 00000000000000000000A195 1392217779060        adtr        0,0,0    adtr@1 192.168.1.224                 1                    0                     0    -49.0          36.0     168.0
# 2 00000000000000000000A263 1392217779065        adtr        0,0,0    adtr@1 192.168.1.224                 1                    0                     0    -49.0          36.0       0.0
# 3 B00000000000001101080802 1392217779323        adtr        0,0,0    adtr@1 192.168.1.224                 1                    0                     0    -72.0          27.0     157.0

I also assumed that you displayed this xml in a browser and cut/paste (which would explain the -<tag>). Otherwise, your xml is not well-formed.