In R, how can I loop over repeated XML nodes, and

I'm working with XML files from clinicaltrials.gov, which have a structure like this:

<clinical_study>
  ...
  <brief_title>
  ...
  <location>
    <facility>
      <name>
      <address>
        <city>
        <state>
        <zip>
        <country>
    </facility>
  </location>
  <location>
    ...
  </location>
  ...
</clinical_study>

I'm gathering information from multiple XML files, so the number of locations in each file is unknown and could even be zero. I need to extract all the information about each location and save into an SQL table. I've had some success using functions from the XML package to extract information from single nodes, e.g.

library(XML)
nct_url <- "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true"
xml_doc <- xmlParse(nct_url, useInternalNode=TRUE)
title_path <- "/clinical_study/brief_title" 
title_text <- xpathSApply(xml_doc, title_path, xmlValue)

I'm experimenting with getNodeSet, and this gives me a set of the right length:

doc <- xmlParse("NCT00007501.xml")
locations <- getNodeSet(doc, "/clinical_study/location")
length(locations)
[1] 22
> class(locations)
[1] "XMLNodeSet"

but my attempts to extract information from this set have been mostly fruitless. Any suggestions?

标签： xml r

3条回答

乱世女痞

2楼-- · 2020-06-06 05:19

This code will put a subset of nodes that correspond to <location> from a clinical trial into a data frame:

library(XML)
clinicalTrialUrl <- "http://clinicaltrials.gov/ct2/show/NCT01480479?resultsxml=true"
xmlDoc <- xmlParse(clinicalTrialUrl, useInternalNode=TRUE)
locations <- xmlToDataFrame(getNodeSet(xmlDoc,"//location"))

In this case there are 221 locations. However, the code assumes sort of a flat structure and lumps subnodes together. For example, anything under <facility> gets concatenated into a single string. I can go into the subnodes and put them one by one into a dataframe.

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2020-06-06 05:25

I don't understand why do you not use again xpathSApply, to retrieve locations as you already did for titles?!

xpathSApply(xml_doc, "//clinical_study/location" , xmlValue)

0人赞添加讨论(0) 举报

做自己的国王

4楼-- · 2020-06-06 05:26

Here is an example

 ns <- getNodeSet(xml, '//clinical_results/outcome_list/outcome/analysis_list/analysis/method')
 element_cnt <-length(ns))
 strings<-paste(sapply(ns, function(x) { xmlValue(x) }),collapse="|"))

0人赞添加讨论(0) 举报

In R, how can I loop over repeated XML nodes, and

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间