How to delete file extension from ending of basena

2019-03-06 19:27发布

问题:

This question already has an answer here:

  • Splitting a file name into name,extension 3 answers

How do I list the data files in a folder and store their filenames without their extensions as factors in a dataframe? In other words: How do I create a character vector from a list of filenames omitting the '.csv' extension and store this vector as a list of factors in a dataframe after creating that dataframe from those files?

My ultimate goal is to store the filenames containing my data as StudyIDs as factors in a dataframe. I think this an extremely simple task, but I have not discovered the formatting required for the regular expression, or if there is some interaction between sapply and gsub that changes the formatting.

Two folders 'planned' and 'blurred' each contain files named 1.csv, 2.csv, etc., with sometimes non-sequential numbers. Specifically, I am thinking it would be good to obtain the factors "Blurred 1", "Planned 1", "Blurred 2", "Planned 2", etc to name the data imported from these files to refer to Study ID (number) and category (planned or blurred).

The code I've tried in RStudio 1.0.143, with a comment on what happens:

# Create a vector of the files to process
filenames <- list.files(path = '../Desktop/data/',full.names=TRUE,recursive=TRUE) 
# We parse the path to find the terminating filename which contains the StudyID.
FileEndings <- basename(filenames)
# We store this filename as the StudyID
regmatches('.csv',FileEndings,invert=TRUE) -> StudyID   # Error: ‘x’ and ‘m’ must have the same length
lapply(FileEndings,grep('.csv',invert=TRUE)) -> StudyID # Error: argument "x" is missing, with no default
sapply(FileEndings,grep,'.csv',invert=TRUE) -> StudyID; StudyID # Wrong: Gives named integer vector of 1's
sapply(FileEndings,grep,'.csv',invert=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: Gives integer vector of 1's
sapply(FileEndings,gsub,'.csv',ignore.case=TRUE,invert=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Error: unused argument (invert = TRUE)
sapply(FileEndings,gsub,'.csv','',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of ""
sapply(FileEndings,gsub,'[:alnum:].csv','[:alnum:]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of "[:alnum:]"
sapply(FileEndings,gsub,'[[:alnum:]].csv','[[:alnum:]]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Wrong: vector of "[[:alnum:]]"
sapply(FileEndings,gsub,'[:alnum:]\.csv','[:alnum:]',ignore.case=TRUE,USE.NAMES=FALSE) -> StudyID; StudyID # Error: '\.' is an unrecognized escape

The documentation has not answered this question, and multiple webpages online provide overly simplistic examples that do not address this problem. I will continue searching, but I hope you will provide the solution to expedite this work and help future users. Thank you.

回答1:

There is a built-in function in the tools package for this: file_path_sans_ext.



回答2:

If you intend to use basename, you might as well just leave out the full.names argument from list.files (as it is FALSE by defualt). I'm not entirely clear on you question but does the following code help?

filenames <- list.files(path = 'DIRECTORY/',recursive=TRUE) 
csvfiles <- filenames[grep(".csv", filenames)] # grep to find pattern matches
finalnames <- sub("(.*)\\.csv","",csvfiles) # sub to replace the pattern


回答3:

I think you missed the $ in your regex for specifically replacing the file ending. What about

gsub(filenames, pattern=".csv$", replacement="")

This should truncate the file ending.

If you want to get rid of the path, too, then you could do a similar substitution for the path:

gsub(filenames, pattern="^.*AAPM2017//", replacement="")