So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the readxl
works for the basic stuff like this:
library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()
for (i in wb){
dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}
# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)
This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow R
to guess the col_types
because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The range
argument uses the cell_ranger
package which does not allow for reading discontinuous columns. So any alternative?
.xlsx >>> you can use library
openxlsx
The
read.xlsx
function from libraryopenxlsx
has an optional parametercols
that takes a numeric index, specifying which columns to read.It seems it reads all columns as characters if at least one column contains characters.
.xls >>> you can use library
XLConnect
The potential problem is that library
XLConnect
requires libraryrJava
, which might be tricky to install on some systems. If you can get it running, thekeep
anddrop
parameters ofreadWorksheet()
accept both column names and indices. ParametercolTypes
deals with column types. This way it works for me:Edit:
Library
readxl
works well for both .xls and .xlsx if you want to read a range (rectangle) from your excel file. E.g.