This question already has an answer here:
How do I list the data files in a folder and store their filenames without their extensions as factors in a dataframe? In other words: How do I create a character vector from a list of filenames omitting the '.csv' extension and store this vector as a list of factors in a dataframe after creating that dataframe from those files?
My ultimate goal is to store the filenames containing my data as StudyIDs as factors in a dataframe. I think this an extremely simple task, but I have not discovered the formatting required for the regular expression, or if there is some interaction between sapply and gsub that changes the formatting.
Two folders 'planned' and 'blurred' each contain files named 1.csv, 2.csv, etc., with sometimes non-sequential numbers. Specifically, I am thinking it would be good to obtain the factors "Blurred 1", "Planned 1", "Blurred 2", "Planned 2", etc to name the data imported from these files to refer to Study ID (number) and category (planned or blurred).
The code I've tried in RStudio 1.0.143, with a comment on what happens:
# Create a vector of the files to process
filenames <- list.files(path = '../Desktop/data/',full.names=TRUE,recursive=TRUE)
# We parse the path to find the terminating filename which contains the StudyID.
FileEndings <- basename(filenames)
# We store this filename as the StudyID
regmatches('.csv',FileEndings,invert=TRUE) -> StudyID # Error: ‘x’ and ‘m’ must have the same length
lapply(FileEndings,grep('.csv',invert=TRUE)) -> StudyID # Error: argument "x" is missing, with no default
sapply(FileEndings,grep,'.csv',invert=TRUE) -> StudyID; StudyID # Wrong: Gives named integer vector of 1's
sapply(FileEndings,grep,'.csv',invert=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: Gives integer vector of 1's
sapply(FileEndings,gsub,'.csv',ignore.case=TRUE,invert=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Error: unused argument (invert = TRUE)
sapply(FileEndings,gsub,'.csv','',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of ""
sapply(FileEndings,gsub,'[:alnum:].csv','[:alnum:]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of "[:alnum:]"
sapply(FileEndings,gsub,'[[:alnum:]].csv','[[:alnum:]]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of "[[:alnum:]]"
sapply(FileEndings,gsub,'[:alnum:]\.csv','[:alnum:]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Error: '\.' is an unrecognized escape
The documentation has not answered this question, and multiple webpages online provide overly simplistic examples that do not address this problem. I will continue searching, but I hope you will provide the solution to expedite this work and help future users. Thank you.
If you intend to use
basename
, you might as well just leave out thefull.names
argument fromlist.files
(as it isFALSE
by defualt). I'm not entirely clear on you question but does the following code help?I think you missed the
$
in your regex for specifically replacing the file ending. What aboutThis should truncate the file ending.
If you want to get rid of the path, too, then you could do a similar substitution for the path:
There is a built-in function in the tools package for this:
file_path_sans_ext
.