I have to read multiple xlsx file with random names into single dataframe. Structure of each file is same. I have to import specific columns only.
I tried this:
dat <- read.xlsx("FILE.xlsx", sheetIndex=1,
sheetName=NULL, startRow=5,
endRow=NULL, as.data.frame=TRUE,
header=TRUE)
But this is for only one file at a time and I couldn't specify my particular columns.
I even tried :
site=list.files(pattern='[.]xls')
but after that loop isn't working. How to do it? Thanks in advance.
I would read each sheet to a list:
Get file names:
f = list.files("./")
Read files:
dat = lapply(f, function(i){
x = read.xlsx(i, sheetIndex=1, sheetName=NULL, startRow=5,
endRow=NULL, as.data.frame=TRUE, header=T)
# Get the columns you want, e.g. 1, 3, 5
x = x[, c(1, 3, 5)]
# You may want to add a column to say which file they're from
x$file = i
# Return your data
x
})
You can then access the items in your list with:
dat[[1]]
Or do the same task to them with:
lapply(dat, colmeans)
Turn them into a data frame (where your file column now becomes useful):
dat = do.call("rbind.data.frame", dat)
I am more familiar with a for loop, which can be a bit more cumbersome.
filelist <- list.files(pattern = "\\.xlsx")
# list all the xlsx files from the directory
allxlsx.files <- list() # create a list to populate with xlsx data (if you wind to bind all the rows together)
count <- 1
for (file in filelist) {
dat <- read.xlsx(file, sheetIndex=1,
sheetName=NULL, startRow=5,
endRow=NULL, as.data.frame=TRUE,
header=TRUE) [c(5:10, 12,15)] # index your columns of interest
allxlsx.files[[count]] <-dat # creat a list of rows from xls files
count <- count + 1
}
convert back to data.frame
allfiles <- do.call(rbind.data.frame, allxlsx.files)
For a variation on Wyldsoul's answer, but using a for loop across multiple Excel sheets (between 1 and j) in the same Excel file, and binding with dplyr:
library(gdata)
library(dplyr)
for (i in 1:j) {
dat <- read.xls(f, sheet = i)
dat <- dat[,1:14] # index your columns of interest
allxlsx.files[[count]]
count <- count + 1
}
allfiles <- do.call(bind_rows, allxlsx.files)