How to open an excel file in R where file format a

2019-08-10 06:11发布

I have an excel file where the following warning message appears when I want to open it:

The file you are trying to open, 'name.ext', is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?

When I click yes to open it, everything is fine. However, I want to read this file in R and couldn't manage that R loads the content despite the warning. How can I achieve this?

One example of the files I want to open with R can be downloaded here. I use MS Office 2016.

1条回答
对你真心纯属浪费
2楼-- · 2019-08-10 06:38

This is an XML file with a UTF-16 BOM (byte order mark) at the beginning. You can read it with R:

library(xml2)
library(rvest)

xls <- read_html("LU0444605991_434.xls")
values <- html_text(html_nodes(xls, xpath="//cell/data"))
dat <- data.frame(matrix(values[5:length(values)], ncol=2, byrow=TRUE), 
                  stringsAsFactors=FALSE)
colnames(dat) <- c("datum", "nav")
dat$nav <- as.numeric(dat$nav)

head(dat)
##                 datum      nav
## 1 2009-10-05T00:00:00 117.1047
## 2 2009-10-06T00:00:00 117.0746
## 3 2009-10-07T00:00:00 117.0915
## 4 2009-10-08T00:00:00 117.0822
## 5 2009-10-09T00:00:00 116.8312
## 6 2009-10-12T00:00:00 116.9347

You can just use the xml2 package (and read_xml) if you really want to bash your head against the wall repeatedly to deal with the crazy XML namespaces in these Microsoft documents.

You'll still need to do date/time conversion and numeric conversion.

查看更多
登录 后发表回答