Error: XML Content does not seem to be XML | R 3.1

2020-02-02 04:32发布

I am trying to get this XML file, but am unable to. I checked the other solutions in the same topic, but I couldn't understand. I am a R newbie.

> library(XML)
> fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
> doc <- xmlTreeParse(fileURL,useInternal=TRUE)

Error: XML content does not seem to be XML: 'https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml'

Can you please help?

标签: xml r parsing
4条回答
干净又极端
2楼-- · 2020-02-02 04:56

Answer is at http://www.omegahat.net/RCurl/installed/RCurl/html/getURL.html. Key point is to use ssl.verifyPeer=FALSE with getURL if certificate error is shown.

library (RCurl)
library (XML)
curlVersion()$features
curlVersion()$protocol
##These should show ssl and https. I can see these on windows 8.1 at least. 
##It may differ on other OSes.

temp <- getURL("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml", ssl.verifyPeer=FALSE)
DFX <- xmlTreeParse(temp,useInternal = TRUE)

If ssl or https capability is not shown by libcurl functions, check using Rcurl with HTTPs.

查看更多
狗以群分
3楼-- · 2020-02-02 05:01

xmlTreeParse does not support https.

You can load the data with getURL (from RCurl) and then parse it.

查看更多
手持菜刀,她持情操
4楼-- · 2020-02-02 05:07

You can use RCurl to fetch the content and then XML seems to be able to handle it

library(XML)
library(RCurl)
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
xData <- getURL(fileURL)
doc <- xmlParse(xData)
查看更多
地球回转人心会变
5楼-- · 2020-02-02 05:09

Remove the s from https

library(XML)

fileURL<-"https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
doc <- xmlTreeParse(sub("s", "", fileURL), useInternal = TRUE)
class(doc)
## [1] "XMLInternalDocument" "XMLAbstractDocument"
查看更多
登录 后发表回答