I have a bunch of csv files that follow the naming scheme: est2009US.csv.
I am reading them into R as follows:
myFiles <- list.files(path="~/Downloads/gtrends/", pattern = "^est[[:digit:]][[:digit:]][[:digit:]][[:digit:]]US*\\.csv$")
myDB <- do.call("rbind", lapply(myFiles, read.csv, header = TRUE))
I would like to find a way to create a new variable that, for each record, is populated with the name of the file the record came from.
You can create the object from lapply
first.
Lapply <- lapply(myFiles, read.csv, header=TRUE))
names(Lapply) <- myFiles
for(i in myFiles)
Lapply[[i]]$Source = i
do.call(rbind, Lapply)
You can avoid looping twice by using an anonymous function that assigns the file name as a column to each data.frame
in the same lapply
that you use to read the csvs.
myDB <- do.call("rbind", lapply(myFiles, function(x) {
dat <- read.csv(x, header=TRUE)
dat$fileName <- tools::file_path_sans_ext(basename(x))
dat
}))
I stripped out the directory and file extension. basename()
returns the file name, not including the directory, and tools::file_path_sans_ext()
removes the file extension.
Nrows <- lapply( lapply(myFiles, read.csv, header=TRUE), NROW)
# might have been easier to store: lapply(myFiles, read.csv, header=TRUE)
myDB$grp <- rep( myFiles, Nrows) )
plyr
makes this very easy:
library(plyr)
paths <- dir(pattern = "\\.csv$")
names(paths) <- basename(paths)
all <- ldply(paths, read.csv)
Because paths
is named, all
will automatically get a column containing those names.