Performance problems with list.files

2019-07-03 16:02发布

I am trying to retrieve files from 3 network drives using list.files and it takes for ever. When I am using find in the shell it returns all results in less then 15 seconds.

system.time(
  jnk <- list.files(c("/Volumes/massspec", "/Volumes/massspec2", "/Volumes/massspec3"), 
                    pattern='_MA_.*_HeLa_', 
                    recursive=TRUE))
#   user  system elapsed 
#  1.567   6.381 309.500 

Here is the equivalent shell command.

time find /Volumes/masssp* -name *_MA_*_HeLa_*
# real  0m13.776s
# user  0m0.361s
# sys   0m0.620s

I need a solution which works on Windows and Unix systems. Has anyone a good idea? The network drives have altogether about 120,000 files but about 16TB. So not much files but very huge ones.

标签: r shell find
1条回答
▲ chillily
2楼-- · 2019-07-03 16:34

Based on the comment, I wrote a little R function which should work on Windows and Unix...

quickFileSearch <- function(path, pattern) {
  switch (.Platform$OS.type,
          unix={
            paths <- paste(path, collapse=' ')
            command <- paste('find', paths, '-name', pattern)
            system(command, intern=TRUE)
          },
          windows={
            paths <- paste(file.path(path, pattern, 
                                     fsep='\\'),
                           collapse=' ')
            command <- paste('dir', paths, '/b /s /a-d')
            shell(command, intern=TRUE)}
  )
}

The whole thing is not much tested yet but it is working for my purpose.

查看更多
登录 后发表回答