How can I turn the filename into a variable when r

2019-01-18 16:48发布

问题:

I have a bunch of csv files that follow the naming scheme: est2009US.csv.

I am reading them into R as follows:

myFiles <- list.files(path="~/Downloads/gtrends/", pattern = "^est[[:digit:]][[:digit:]][[:digit:]][[:digit:]]US*\\.csv$")

myDB <- do.call("rbind", lapply(myFiles, read.csv, header = TRUE))

I would like to find a way to create a new variable that, for each record, is populated with the name of the file the record came from.

回答1:

You can create the object from lapply first.

Lapply <- lapply(myFiles, read.csv, header=TRUE))
names(Lapply) <- myFiles
for(i in myFiles) 
    Lapply[[i]]$Source = i
do.call(rbind, Lapply)


回答2:

You can avoid looping twice by using an anonymous function that assigns the file name as a column to each data.frame in the same lapply that you use to read the csvs.

myDB <- do.call("rbind", lapply(myFiles, function(x) {
  dat <- read.csv(x, header=TRUE)
  dat$fileName <- tools::file_path_sans_ext(basename(x))
  dat
}))

I stripped out the directory and file extension. basename() returns the file name, not including the directory, and tools::file_path_sans_ext() removes the file extension.



回答3:

Nrows <- lapply( lapply(myFiles, read.csv, header=TRUE), NROW)
# might have been easier to store: lapply(myFiles, read.csv, header=TRUE)
myDB$grp <- rep( myFiles, Nrows) )


回答4:

plyr makes this very easy:

library(plyr)
paths <- dir(pattern = "\\.csv$")
names(paths) <- basename(paths)

all <- ldply(paths, read.csv)

Because paths is named, all will automatically get a column containing those names.



标签: r csv import