How to import multiple .csv files at once?

2018-12-31 01:10发布

Suppose we have a folder containing multiple data.csv files, each containing the same number of variables but each from different times. Is there a way in R to import them all simultaneously rather than having to import them all individually?

My problem is that I have around 2000 data files to import and having to import them individually just by using the code:

read.delim(file="filename", header=TRUE, sep="\t")

is not very efficient.

标签: r csv import r-faq
12条回答
临风纵饮
2楼-- · 2018-12-31 01:25

As well as using lapply or some other looping construct in R you could merge your CSV files into one file.

In Unix, if the files had no headers, then its as easy as:

cat *.csv > all.csv

or if there are headers, and you can find a string that matches headers and only headers (ie suppose header lines all start with "Age"), you'd do:

cat *.csv | grep -v ^Age > all.csv

I think in Windows you could do this with COPY and SEARCH (or FIND or something) from the DOS command box, but why not install cygwin and get the power of the Unix command shell?

查看更多
梦该遗忘
3楼-- · 2018-12-31 01:28

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
list2env(
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
         read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.

查看更多
只若初见
4楼-- · 2018-12-31 01:33

I use this successfully:

xlist<-list.files(pattern = "*.csv")

for(i in xlist) { 
  x <- read.csv((i))
  assign(i, x)
    }
查看更多
谁念西风独自凉
5楼-- · 2018-12-31 01:35

In my view, most of the other answers are obsoleted by rio::import_list, which is a succinct one-liner:

library(rio)
my_data <- import_list(dir("path_to_directory", pattern = ".csv", rbind = TRUE))

Any extra arguments are passed to rio::import. rio can deal with almost any file format R can read, and it uses data.table's fread where possible, so it should be fast too.

查看更多
只若初见
6楼-- · 2018-12-31 01:35

Building on dnlbrk's comment, assign can be considerably faster than list2env for big files.

library(readr)
library(stringr)

List_of_file_paths <- list.files(path ="C:/Users/Anon/Documents/Folder_with_csv_files/", pattern = ".csv", all.files = TRUE, full.names = TRUE)

By setting the full.names argument to true, you will get the full path to each file as a separate character string in your list of files, e.g., List_of_file_paths[1] will be something like "C:/Users/Anon/Documents/Folder_with_csv_files/file1.csv"

for(f in 1:length(List_of_filepaths)) {
  file_name <- str_sub(string = List_of_filepaths[f], start = 46, end = -5)
  file_df <- read_csv(List_of_filepaths[f])  
  assign( x = file_name, value = file_df, envir = .GlobalEnv)
}

You could use the data.table package's fread or base R read.csv instead of read_csv. The file_name step allows you to tidy up the name so that each data frame does not remain with the full path to the file as it's name. You could extend your loop to do further things to the data table before transferring it to the global environment, for example:

for(f in 1:length(List_of_filepaths)) {
  file_name <- str_sub(string = List_of_filepaths[f], start = 46, end = -5)
  file_df <- read_csv(List_of_filepaths[f])  
  file_df <- file_df[,1:3] #if you only need the first three columns
  assign( x = file_name, value = file_df, envir = .GlobalEnv)
}
查看更多
孤独寂梦人
7楼-- · 2018-12-31 01:36

This is the code I developed to read all csv files into R. It will create a dataframe for each csv file individually and title that dataframe the file's original name (removing spaces and the .csv) I hope you find it useful!

path <- "C:/Users/cfees/My Box Files/Fitness/"
files <- list.files(path=path, pattern="*.csv")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)), 
read.csv(paste(path,file,sep="")))
}
查看更多
登录 后发表回答