I have seen other posts that detail how to import multiple sheets from a single excel file into R, and also how to import a series of excel files that have a single sheet in each, but not the two at the same time.
I am able to make the following code work in pieces at an individual file level, but I think I may be screwing up the loop somehow in some simple way.
When I execute the loop, I only get output from the first member of the list (files[1]) instead of output from all iterations of the loop appended in the all_data list object.
Here's my code:
# name filepath of excel files to import
file_path ="..."
# load names of excel files
files = list.files(path = file_path, pattern = ".xlsx", )
# create list to store data
all_data = list()
# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
x <- lapply(x, as.data.frame)
names(x) <- sheets
}
# execute function for all excel files in "files"
for (i in length(files)){
filename = paste0(file_path,"/", files[i])
read_excel_allsheets(filename)
all_data = c(all_data, x)
}
I suspect I'm just making a fundamental loop mistake, but have searched around and can't find out how to fix. any help is really appreciated!
For multiple sheets within multiple Excel files, you need nested looping. Consider nested lapply
calls. Specifically, return the appropriate object as right now it returns sheets for every call of function. Then, convert for
to lapply
for a list of objects:
# load names of excel files
files = list.files(path = "...", full.names = TRUE, pattern = ".xlsx")
# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
tibble_list <- lapply(sheets, function(sh) readxl::read_excel(filename, sheet = sh)
df_list <- lapply(x, as.data.frame)
names(df_list) <- sheets
return(df_list)
}
# execute function for all excel files in "files"
all_data <- lapply(files, read_excel_allsheets)
Even name you all_data list by each file's basename:
# name outer list
xl_base_names <- lapply(files, basename)
all_data <- setNames(all_data, xl_base_names)
Alternatively, shorten your function and use sapply(..., ..., simplify = FALSE
) which by default names list by supplied character vector:
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
sapply(sheets, function(f) as.data.frame(readxl::read_excel(filename, sheet = f)),
simplify = FALSE)
}
see if something like this works for you:
library(readxl)
library(fs)
library(purrr)
file_names <- dir_ls("folder_name",
glob = "*.xlsx", ignore.case = TRUE)
x <- map_df(file_names, function(x){
sheet_names <- excel_sheets(x)
raw_data <- map_df(sheet_names, ~read_excel(x, sheet = .x))
return(raw_data)})
Here's an option that returns a data frame with columns for file and sheet names for each file. In this example, not every file has the same sheets or columns; test2.xlsx has only one sheet and test3.xlsx sheet1 does not have col3.
library(tidyverse)
library(readxl)
dir_path <- "~/test_dir/" # target directory where the xlsx files are located.
re_file <- "^test[0-9]\\.xlsx" # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc.
read_sheets <- function(dir_path, file){
xlsx_file <- paste0(dir_path, file)
xlsx_file %>%
excel_sheets() %>%
set_names() %>%
map_df(read_excel, path = xlsx_file, .id = 'sheet_name') %>%
mutate(file_name = file) %>%
select(file_name, sheet_name, everything())
}
df <- list.files(dir_path, re_file) %>%
map_df(~ read_sheets(dir_path, .))
# A tibble: 15 x 5
file_name sheet_name col1 col2 col3
<chr> <chr> <dbl> <dbl> <dbl>
1 test1.xlsx Sheet1 1 2 4
2 test1.xlsx Sheet1 3 2 3
3 test1.xlsx Sheet1 2 4 4
4 test1.xlsx Sheet2 3 3 1
5 test1.xlsx Sheet2 2 2 2
6 test1.xlsx Sheet2 4 3 4
7 test2.xlsx Sheet1 1 3 5
8 test2.xlsx Sheet1 4 4 3
9 test2.xlsx Sheet1 1 2 2
10 test3.xlsx Sheet1 3 9 NA
11 test3.xlsx Sheet1 4 7 NA
12 test3.xlsx Sheet1 5 3 NA
13 test3.xlsx Sheet2 1 3 4
14 test3.xlsx Sheet2 2 5 9
15 test3.xlsx Sheet2 4 3 1