Copying files over network MUCH slower with `file.

2019-06-18 01:14发布

I have been having some issues with R becoming very sluggish when accessing files over our corp network. So I dropped back and did some testing and I was shocked to discover that the R file.copy() command is MUCH slower than the equivalent file copy using system(mv ...). Is this a known issue or am I doing something wrong here?

Here's my test:

I have 3 files:

  • large_random.txt - ~100MB
  • medium_random.txt - ~10MB
  • small_random.txt - ~1 MB

I created these on my Mac like so:

dd if=/dev/urandom of=small_random.txt bs=1048576 count=1
dd if=/dev/urandom of=medium_random.txt bs=1048576 count=10
dd if=/dev/urandom of=large_random.txt bs=1048576 count=100

But the following R tests were all done using Windows running in a virtual machine. The J drive is local and the N drive is 700 miles away.

library(tictoc)

test_copy <- function(source, des){
  tic('r file.copy')
  file.remove(des)
  file.copy(source, des )
  toc()

  tic('system call')
  system(paste('rm', des, sep=' '))
  system(paste('cp', source, des, sep=' '))
  toc()
}

source <- 'J:\\tidy_examples\\dummyfiles\\small_random.txt'
des <- 'N:\\JAL\\2018\\_temp\\small_random.txt'
test_copy(source, des)

source <- 'J:\\tidy_examples\\dummyfiles\\medium_random.txt'
des <- 'N:\\JAL\\2018\\_temp\\medium_random.txt'
test_copy(source, des)

source <- 'J:\\tidy_examples\\dummyfiles\\large_random.txt'
des <- 'N:\\JAL\\2018\\_temp\\large_random.txt'
test_copy(source, des)

Which results in the following:

> source <- 'J:\\tidy_examples\\dummyfiles\\small_random.txt'
> des <- 'N:\\JAL\\2018\\_temp\\small_random.txt'
> test_copy(source, des)
r file.copy: 6.49 sec elapsed
system call: 2.12 sec elapsed
> 
> source <- 'J:\\tidy_examples\\dummyfiles\\medium_random.txt'
> des <- 'N:\\JAL\\2018\\_temp\\medium_random.txt'
> test_copy(source, des)
r file.copy: 56.86 sec elapsed
system call: 4.65 sec elapsed
> 
> source <- 'J:\\tidy_examples\\dummyfiles\\large_random.txt'
> des <- 'N:\\JAL\\2018\\_temp\\large_random.txt'
> test_copy(source, des)
r file.copy: 562.94 sec elapsed
system call: 31.01 sec elapsed
> 

So what's going on that makes the system call so much faster? At the large file size it's > 18x slower!

标签: r drive
0条回答
登录 后发表回答