How to extract xml data from a CrossRef using R?

2020-04-11 11:07发布

问题:

If you put in your CrossRef email the following URL produces an XML file

"http://www.crossref.org/openurl?title=Science&aulast=Fernández&date=2009&multihit=true&pid=your.crossref.email"

An example file is available here:

crossref.xml

I wish to extract the list of DOI (Digital Object Identifies) into an data.frame in R. I wish to do so using one of the general R xml packages

library(XML) or library(tm)

I have tried

doc<-xmlTreeParse(file)
top<-xmlRoot(doc)

but can not figure out how to go from here

top[[1]]["doi"]

does not work.

回答1:

Try this:

library(XML)
doc <- xmlTreeParse("crossref.xml", useInternalNodes = TRUE)
root <- xmlRoot(doc)
xpathSApply(root, "//x:doi", xmlValue, namespaces = "x")


回答2:

I and others as part of rOpenSci have some functions for hitting the Crossref API, functions crossref and crossref_r here.



回答3:

I had the exact same lack of understanding. I spent a day and half looking and finaly came across this post.

Thanks!!!



标签: xml r metadata