First off, this is related to a homework question for the Coursera R programming course. I have found other ways to do what I want to do but my research has led me to a question I'm curious about. I have a variable number of csv files that I need to pull data from and then take the mean of the "pollutant" column in said files. The files are listed in their directory with an id number. I put together the following code which works fine for a single csv file but doesn't work for multiple csv files:
pollutantmean <- function (directory, pollutant, id = 1:332) {
id <- formatC(id, width=3, flag="0")`
dataset<-read.csv(paste(directory, "/", id,".csv",sep=""),header=TRUE)`
mean(dataset[,pollutant], na.rm = TRUE)`
}
I also know how to rbind
multiple csv files together if I know the ids when I am creating the function, but I am not sure how to assign rbind
to a variable range of ids or if thats even possible. I found other ways to do it such as calling an lapply
and the unlisting the data, just curious if there is an easier way.
A vector is not accepted for 'file' in read.csv(file, ...)
Below is a slight modification of yours. A vector of file paths are created and they are looped by sapply.
Well, this uses an lapply, but it might be what you want.
This will turn all of your files into one large dataset, and from there it's easy to take the mean. Is that what you wanted?
An alternative way of doing this would be to step through file by file, taking sums and number of observations and then taking the mean afterwards, like so:
Note that both of these methods require that only your desired csvs are in that folder. You can use a pattern argument in the
list.files
call if you have other files in there that you're not interested in.