This question already has an answer here:
- How to import multiple .csv files at once? 12 answers
I have hundreds of medium sized Excel files (between 5000 and 50.0000 rows with about 100 columns) to load into R. They have a well-defined naming pattern, like x_1.xlsx
, x_2.xlsx
, etc.
How can I load these files into R in the fastest, most straightforward way?
With
list.files
you can create a list of all the filenames in your workingdirectory. Next you can uselapply
to loop over that list and read each file with theread_excel
function from thereadxl
package:This method can off course also be used with other file reading functions like
read.csv
orread.table
. Just replaceread_excel
with the appropriate file reading function and make sure you use the correct pattern inlist.files
.If you also want to include the files in subdirectories, use:
Other possible packages for reading Excel-files: openxlsx & xlsx
Supposing the columns are the same for each file, you can bind them together in one dataframe with
bind_rows
from dplyr:or with
rbindlist
from data.table:Both have the option to add a
id
column for identifying the separate datasets.Update: If you don't want a numeric identifier, just use
sapply
withsimplify = FALSE
to read the files infile.list
:When using
bind_rows
from dplyr orrbindlist
from data.table, theid
column now contains the filenames.Even another approach is using the
purrr
-package:Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
Now you can bind the list of dataframes together in one dataframe with
rbindlist
from data.table orbind_rows
from dplyr. Theid
column will now contain the filenames instead of a numeric indentifier.