This question already has an answer here:
-
How to import multiple .csv files at once?
12 answers
I have hundreds of medium sized Excel files (between 5000 and 50.0000 rows with about 100 columns) to load into R. They have a well-defined naming pattern, like x_1.xlsx
, x_2.xlsx
, etc.
How can I load these files into R in the fastest, most straightforward way?
With list.files
you can create a list of all the filenames in your workingdirectory. Next you can use lapply
to loop over that list and read each file with the read_excel
function from the readxl
package:
library(readxl)
file.list <- list.files(pattern=\'*.xlsx\')
df.list <- lapply(file.list, read_excel)
This method can off course also be used with other file reading functions like read.csv
or read.table
. Just replace read_excel
with the appropriate file reading function and make sure you use the correct pattern in list.files
.
If you also want to include the files in subdirectories, use:
file.list <- list.files(pattern=\'*.xlsx\', recursive = TRUE)
Other possible packages for reading Excel-files: openxlsx & xlsx
Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows
from dplyr:
library(dplyr)
df <- bind_rows(df.list, .id = \"id\")
or with rbindlist
from data.table:
library(data.table)
df <- rbindlist(df.list, idcol = \"id\")
Both have the option to add a id
column for identifying the separate datasets.
Update: If you don\'t want a numeric identifier, just use sapply
with simplify = FALSE
to read the files in file.list
:
df.list <- sapply(file.list, read.csv, simplify=FALSE)
When using bind_rows
from dplyr or rbindlist
from data.table, the id
column now contains the filenames.
Even another approach is using the purrr
-package:
library(purrr)
file.list <- list.files(pattern=\'*.csv\')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names
df <- map_df(file.list, read.csv, .id = \"id\")
Other approaches to getting a named list: If you don\'t want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
# with the \'attr\' function from base R
attr(df.list, \"names\") <- file.list
# with the \'names\' function from base R
names(df.list) <- file.list
# with the \'setattr\' function from the \'data.table\' package
setattr(df.list, \"names\", file.list)
Now you can bind the list of dataframes together in one dataframe with rbindlist
from data.table or bind_rows
from dplyr. The id
column will now contain the filenames instead of a numeric indentifier.