How can you concatenate two huge files with very l

2019-03-16 01:18发布

Suppose that you have two huge files (several GB) that you want to concatenate together, but that you have very little spare disk space (let's say a couple hundred MB). That is, given file1 and file2, you want to end up with a single file which is the result of concatenating file1 and file2 together byte-for-byte, and delete the original files.

You can't do the obvious cat file2 >> file1; rm file2, since in between the two operations, you'd run out of disk space.

Solutions on any and all platforms with free or non-free tools are welcome; this is a hypothetical problem I thought up while I was downloading a Linux ISO the other day, and the download got interrupted partway through due to a wireless hiccup.

15条回答
淡お忘
2楼-- · 2019-03-16 01:51

ok, for theoretical entertainment, and only if you promise not to waste your time actually doing it:

  • files are stored on disk in pieces
  • the pieces are linked in a chain

So you can concatenate the files by:

  • linking the last piece of the first file to the first piece of the last file
  • altering the directory entry for the first file to change the last piece and file size
  • removing the directory entry for the last file
  • cleaning up the first file's end-of-file marker, if any
  • note that if the last segment of the first file is only partially filled, you will have to copy data "up" the segments of the last file to avoid having garbage in the middle of the file [thanks @Wedge!]

This would be optimally efficient: minimal alterations, minimal copying, no spare disk space required.

now go buy a usb drive ;-)

查看更多
神经病院院长
3楼-- · 2019-03-16 01:52

Here's a slight improvement over my first answer.

If you have 100MB free, copy the last 100MB from the second file and create a third file. Truncate the second file so it is now 100MB smaller. Repeat this process until the second file has been completely decomposed into individual 100MB chunks.

Now each of those 100MB files can be appended to the first file, one at a time.

查看更多
放我归山
4楼-- · 2019-03-16 01:57

time spent figuring out clever solution involving disk-sector shuffling and file-chain manipulation: 2-4 hours

time spent acquiring/writing software to do in-place copy and truncate: 2-20 hours

times median $50/hr programmer rate: $400-$1200

cost of 1TB USB drive: $100-$200

ability to understand the phrase "opportunity cost": priceless

查看更多
ゆ 、 Hurt°
5楼-- · 2019-03-16 01:58

Two thoughts:

If you have enough physical RAM, you could actually read the second file entirely into memory, delete it, then write it in append mode to the first file. Of course if you lose power after deleting but before completing the write, you've lost part of the second file for good.

Temporarily reduce disk space used by OS functionality (e.g. virtual memory, "recycle bin" or similar). Probably only of use on Windows.

查看更多
唯我独甜
6楼-- · 2019-03-16 02:00

Not very efficient, but I think it can be done.

Open the first file in append mode, and copy blocks from the second file to it until the disk is almost full. For the remainder of the second file, copy blocks from the point where you stopped back to the beginning of the file via random access I/O. Truncate the file after you've copied the last block. Repeat until finished.

查看更多
登录 后发表回答