I have an excel file where the following warning message appears when I want to open it:
The file you are trying to open, 'name.ext', is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?
When I click yes to open it, everything is fine. However, I want to read this file in R and couldn't manage that R loads the content despite the warning. How can I achieve this?
One example of the files I want to open with R can be downloaded here. I use MS Office 2016.
This is an XML file with a UTF-16 BOM (byte order mark) at the beginning. You can read it with R:
library(xml2)
library(rvest)
xls <- read_html("LU0444605991_434.xls")
values <- html_text(html_nodes(xls, xpath="//cell/data"))
dat <- data.frame(matrix(values[5:length(values)], ncol=2, byrow=TRUE),
stringsAsFactors=FALSE)
colnames(dat) <- c("datum", "nav")
dat$nav <- as.numeric(dat$nav)
head(dat)
## datum nav
## 1 2009-10-05T00:00:00 117.1047
## 2 2009-10-06T00:00:00 117.0746
## 3 2009-10-07T00:00:00 117.0915
## 4 2009-10-08T00:00:00 117.0822
## 5 2009-10-09T00:00:00 116.8312
## 6 2009-10-12T00:00:00 116.9347
You can just use the xml2
package (and read_xml
) if you really want to bash your head against the wall repeatedly to deal with the crazy XML namespaces in these Microsoft documents.
You'll still need to do date/time conversion and numeric conversion.