What is the fastest way to transfer files over a n

2019-03-12 04:46发布

问题:

I'm trying to figure out the best way to transfer large amounts of data over a network between two systems. I am currently looking into either FTP, HTTP, or RSync, and I am wondering which one is the fastest. I've looked online for some answers and found the following sites:

  • http://daniel.haxx.se/docs/ftp-vs-http.html
  • http://www.isi.edu/lsam/publications/http-perf/

The problem is that these are old, and talk more about the theoretical differences between how the protocols communicate. I am more interested with actual benchmarks, that can say that for a specific setup, when transferring files of varying sizes one protocol is x% faster then the others.

Has anyone test these and posted the results somewhere?

回答1:

Alright, so I setup the following test:

  • Hardware: 2 desktops Intel Core Duo CPU @ 2.33GHz, with 4G of RAM.
  • OS: Ubuntu 11.10 on both machines
  • Network: 100Mb dedicated switch, both machines are connect to it.
  • Software:
    • Python HTTP server (inspired by this).
    • Python FTP server (inspired by this).
    • Python HTTP client (inspired by this).
    • Python FTP client (inspired by this).

I uploaded the following groups of files to each server:

  1. 1 100M file.
  2. 10 10M files.
  3. 100 1M files.
  4. 1,000 100K files.
  5. 10,000 10K files.

I got the following average results over multiple runs (numbers in seconds):

|-----------+---------+----------|
| File Size | FTP (s) | HTTP (s) |
|-----------+---------+----------|
|      100M |       8 |        9 |
|       10M |       8 |        9 |
|        1M |       8 |        9 |
|      100K |      14 |       12 |
|       10K |      46 |       41 |
|-----------+---------+----------| 

So, it seems that FTP is slightly faster in large files, and HTTP is a little faster in many small files. All in all, I think that they are comparable, and the server implementation is much more important then the protocol.



回答2:

If the machines at each end are reasonably powerful (ie not netbooks, NAS boxes, toasters, etc), then I would expect all protocols which work over TCP to be much the same speed at transferring bulk data. The application protocol's job is really just to fill a buffer for TCP to transfer, so as long as they can keep it full, TCP will set the pace.

Protocols which do compression or encryption may bottleneck at the CPU on less powerful machines. My netbook does FTP much faster than SCP.

rsync does clever things to transmit incremental changes quickly, but for bulk transfers it has no advantage over dumber protocols.



回答3:

Another utility to consider is bbcp : http://www.slac.stanford.edu/~abh/bbcp/.

A good, but dated, tutorial to using it is here: http://pcbunn.cithep.caltech.edu/bbcp/using_bbcp.htm . I have found that bbcp is extremely good at transferring large files (multiple GBs). In my experience, it is faster than rsync on average.



回答4:

rsync optionally compresses its data. That typically makes the transfer go much faster. See rsync -z.

You didn't mention scp, but scp -C also compresses.

Do note that compression might make the transfer go faster or slower, depending upon the speed of your CPU and of your network link. (Slower links and faster CPU make compression a good idea; faster links and slower CPU make compression a bad idea.) As with any optimization, measure the results in your own environment.



回答5:

I'm afraid if you want to know the answer for your needs and setup, you either have to be more specific or do your own performance (and reliability) tests. It does help to have an at least rudimentary understanding of the protocols in question and their communication, so I'd consider the articles you've been quoting a helpful resource. It also helps to know which restrictions the early inventors of these protocols faced - was their aim to keep network impact low, were they memory-starved, or did they have to count their cpu-cycles? Here's a few things to consider or answer if you want to get an answer tailored to your situation:

  • OS/File System related:
    • are you copying between the same OS/FS combination or do you have to worry about incompatibilities, such as file types without matching equivalent at the receiving end?
    • I.e. do you have anything special to transport? Metadata, ressource forks, extended attributes, file permissions might either just not be transported by the protocol/tool of your choice, or be meaningless at the receiving end.
    • The same goes for sparse files, which might end up being bloated to full size at the other end of the copy, ruining all plans you may have had about sizing.
  • Physical constraints related:
    • Network impact
    • cpu load: nowadays, compression is much "cheaper", since modern CPUs are less challenged by the compression than those back in the times when most transfer protocols were designed.
    • failure tolerance - do you need to be able to pick up where an interrupted transfer left you, or do you prefer to start anew?
    • incremental transfers, or full transfers? Does an incremental transfer pose any big savings for you, or do you have full transfers by design of your task anyway? In the latter case, the added latency and memory impact to build the transfer list before starting the transfer would be a less desirable tradeoff.
    • How good is the protocol at utilizing the MTU available by your underlying network protocol?
    • Do you need to maintain a steady stream of data, for example to keep a tape drive streaming at the receiving end?

Lots of things to consider, and I'm sure the listing isn't even complete.