I'm trying to get all the info from this page: http://ws.parlament.ch/affairs/19110758/?format=xml
First I download the file into file
and parse it then with xmlParse(file)
.
download.file(url = paste0(http://ws.parlament.ch/affairs/19110758/?format=xml), destfile = destfile)
file <- xmlParse(destfile[])
I now want to extract all the information I need. For example the title and the ID-number. I tried something like this:
title <- xpathSApply(file, "//h2", xmlValue)
But this gives me only an error: unable to find an inherited method for function ‘saveXML’ for signature ‘"XMLDocument"
Next thing I tried is this:
library(plyr)
test <-ldply(xmlToList(file), function(x) { data.frame(x[!names(x)=="id"]) } )
This gives me a data.frame
with some Info. But I lose info such as id
(which is most important).
I'd like to get a data.frame
with a row (only one row per affair) containing all the Information of one affair, such as id``updated
additionalIndexing``affairType
etc.
With this, it works (example for id
):
infofile <- xmlRoot(file)
nodes <- getNodeSet(file, "//affair/id")
id <-as.numeric(lapply(nodes, function(x) xmlSApply(x, xmlValue)))