Loop in R to read many files

2019-01-01 02:48发布

问题:

I have been wondering if anybody knows a way to create a loop that loads files/databases in R. Say i have some files like that: data1.csv, data2.csv,..., data100.csv.

In some programming languages you one can do something like this data +{ x }+ .csv the system recognizes it like datax.csv, and then you can apply the loop.

Any ideas?

回答1:

Sys.glob() is another possibility - it\'s sole purpose is globbing or wildcard expansion.

dataFiles <- lapply(Sys.glob(\"data*.csv\"), read.csv)

That will read all the files of the form data[x].csv into list dataFiles, where [x] is nothing or anything.

[Note this is a different pattern to that in @Joshua\'s Answer. There, list.files() takes a regular expression, whereas Sys.glob() just uses standard wildcards; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob.]



回答2:

See ?list.files.

myFiles <- list.files(pattern=\"data.*csv\")

Then you can loop over myFiles.



回答3:

I would put all the CSV files in a directory, create a list and do a loop to read all the csv files from the directory in the list.

setwd(\"~/Documents/\")
ldf <- list() # creates a list
listcsv <- dir(pattern = \"*.csv\") # creates the list of all the csv files in the directory
for (k in 1:length(listcsv)){
 ldf[[k]] <- read.csv(listcsv[k])
}
str(ldf[[1]]) 


回答4:

fi<-list.files(directory_path,full.names=T)
dat<-lapply(fi,read.csv)

dat will contain the datasets in a list



回答5:

Read the headers in a file so that we can use them for replacing in merged file

library(dplyr)
library(readr)

list_file <- list.files(pattern = \"*.csv\") %>% 
  lapply(read.csv, stringsAsFactors=F) %>% 
   bind_rows 


回答6:

Let\'s assume that your files have the file format that you mentioned in your question and that they are located in the working directory.

You can vectorise creation of the file names if they have a simple naming structure. Then apply a loading function on all the files (here I used purrr package, but you can also use lapply)

library(purrr)
c(1:100) %>% paste0(\"data\", ., \".csv\") %>% map(read.csv)


回答7:

This may be helpful if you have datasets for participants as in psychology/sports/medicine etc.

setwd(\"C:/yourpath\")

temp <- list.files(pattern = \"*.sav\")

#Maybe you want to unselect /delete IDs
DEL <- grep(\'ID(04|08|11|13|19).sav\', temp)
temp2 <- temp[-DEL]

#Make a list of that contains all data
read.all <- lapply(temp2, read_sav)
#View(read.all[1])

#Option 1: put one under the next
df <- do.call(\"rbind\", read.all)

Option 2: make something within each dataset (single IDs) e.g. get the mean of certain parts of each participant

mw_extraktion <- function(data_raw){
  data_raw <- data.frame(data_raw)
  #you may now calculate e.g. the mean for a certain variable for each ID
  ID <- data_raw$ID[1]
  data_OneID <- c(ID, Var2, Var3) #put your new variables (e.g. Means) here
} #end of function   
data_combined <- t(data.frame(sapply(read.all, mw_extraktion) ) )


标签: r loops