This question already has an answer here:
- Splitting a file name into name,extension 3 answers
How do I list the data files in a folder and store their filenames without their extensions as factors in a dataframe? In other words: How do I create a character vector from a list of filenames omitting the '.csv' extension and store this vector as a list of factors in a dataframe after creating that dataframe from those files?
My ultimate goal is to store the filenames containing my data as StudyIDs as factors in a dataframe. I think this an extremely simple task, but I have not discovered the formatting required for the regular expression, or if there is some interaction between sapply and gsub that changes the formatting.
Two folders 'planned' and 'blurred' each contain files named 1.csv, 2.csv, etc., with sometimes non-sequential numbers. Specifically, I am thinking it would be good to obtain the factors "Blurred 1", "Planned 1", "Blurred 2", "Planned 2", etc to name the data imported from these files to refer to Study ID (number) and category (planned or blurred).
The code I've tried in RStudio 1.0.143, with a comment on what happens:
# Create a vector of the files to process
filenames <- list.files(path = '../Desktop/data/',full.names=TRUE,recursive=TRUE)
# We parse the path to find the terminating filename which contains the StudyID.
FileEndings <- basename(filenames)
# We store this filename as the StudyID
regmatches('.csv',FileEndings,invert=TRUE) -> StudyID # Error: ‘x’ and ‘m’ must have the same length
lapply(FileEndings,grep('.csv',invert=TRUE)) -> StudyID # Error: argument "x" is missing, with no default
sapply(FileEndings,grep,'.csv',invert=TRUE) -> StudyID; StudyID # Wrong: Gives named integer vector of 1's
sapply(FileEndings,grep,'.csv',invert=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: Gives integer vector of 1's
sapply(FileEndings,gsub,'.csv',ignore.case=TRUE,invert=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Error: unused argument (invert = TRUE)
sapply(FileEndings,gsub,'.csv','',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of ""
sapply(FileEndings,gsub,'[:alnum:].csv','[:alnum:]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of "[:alnum:]"
sapply(FileEndings,gsub,'[[:alnum:]].csv','[[:alnum:]]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of "[[:alnum:]]"
sapply(FileEndings,gsub,'[:alnum:]\.csv','[:alnum:]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Error: '\.' is an unrecognized escape
The documentation has not answered this question, and multiple webpages online provide overly simplistic examples that do not address this problem. I will continue searching, but I hope you will provide the solution to expedite this work and help future users. Thank you.