R - grab first portion of file name

2019-08-31 06:34发布

I have lots of files in a directory dir, with the following format

[xyz][sequence of numbers or letters]_[single number]_[more stuff that does not matter].csv

for example, xyz39289_3_8932jda.csv

I would like to write a function that returns all the first portions of all the file names in that directory. By first portion, I mean the [xyz][sequence of numbers] portion. So, in the example above, this would include the xyz39289. As such, the function would ultimately return a list such as

[xyz39289, xyz9382, xyz03319927, etc]

How can I do this in R? In Java, I would do the following:

File[] files = new File(dir).listFiles();
ArrayList<String> output = new ArrayList<String>();
for(int i = 0; i < files.length; i++) {
   output.add(files[i].getName().substring(0,files[i].getName().indexOf("_"));
}

标签: regex r file
3条回答
一纸荒年 Trace。
2楼-- · 2019-08-31 07:20

here's another version. list all files

myfiles <- list.files(path="./dir")

split each file name on "_" and keep the first part

myfiles.pre <- sapply(myfiles, function(x) strsplit(x,"_",fixed=T)[[1]][1])
查看更多
SAY GOODBYE
3楼-- · 2019-08-31 07:24

Might be easiest to delete everything after the first _.

sub("_.*$", "", files)
查看更多
Viruses.
4楼-- · 2019-08-31 07:25

After you get your list of files with list.files (and possibly extract just the files that you want that begin with xyz, I'd use sub.

files <- list.files(dir)
files <- files[grep("^xyz",files, perl = TRUE)]
filepart <- sub("^(xyz[^_]*)_.*$","\\1",files, perl = TRUE)

There's also a regexpr method that I'm not too certain with. Something like

files <- list.files(dir)
matchdat <- regexpr("^xyz.*?(?=_)",files, perl = TRUE)
filepart <- regmatches(test,matchdat)
查看更多
登录 后发表回答