How can you concatenate two huge files with very l

2019-03-16 01:18发布

Suppose that you have two huge files (several GB) that you want to concatenate together, but that you have very little spare disk space (let's say a couple hundred MB). That is, given file1 and file2, you want to end up with a single file which is the result of concatenating file1 and file2 together byte-for-byte, and delete the original files.

You can't do the obvious cat file2 >> file1; rm file2, since in between the two operations, you'd run out of disk space.

Solutions on any and all platforms with free or non-free tools are welcome; this is a hypothetical problem I thought up while I was downloading a Linux ISO the other day, and the download got interrupted partway through due to a wireless hiccup.

15条回答
ゆ 、 Hurt°
2楼-- · 2019-03-16 01:41

Obviously, the economic answer is buy more storage assuming that's a possible answer. It might not be, though--embedded system with no way to attach more storage, or even no access to the equipment itself--say, space probe in flight.

The previously presented answer based on the sparse file system is good (other than the destructive nature of it if something goes wrong!) if you have a sparse file system. What if you don't, though?

Starting from the end of file 2 copy blocks to the start of the target file reversing them as you go. After each block you truncate the source file to the uncopied length. Repeat for file #1.

At this point the target file contains all the data backwards, the source files are gone.

Read a block from the tart and from the end of the target file, reverse them and write them to the spot the other came from. Work your way inwards flipping blocks.

When you are done the target file is the concatenation of the source files. No sparse file system needed, no messing with the file system needed. This can be carried out at zero bytes free as the data can be held in memory.

查看更多
老娘就宠你
3楼-- · 2019-03-16 01:43

I doubt this is a direct answer to the question. You can consider this as an alternative way to solve the problem.

I think it is possible to consider 2nd file as the part 2 of the first file. Usually in zip application, we would see a huge file is split into multiple parts. If you open the first part, the application would automatically consider the other parts in further processing.

We can simulate the same thing here. As @edg pointed out, tinkering file system would be one way.

查看更多
Evening l夕情丶
4楼-- · 2019-03-16 01:45

you could do this:

head file2 --bytes=1024 >> file1 && tail --bytes=+1024 file2 >file2 

you can increase 1024 according to how much extra disk space you have, then just repeat this until all the bytes have been moved.

This is probably the fastest way to do it (in terms of development time)

查看更多
Melony?
5楼-- · 2019-03-16 01:46

OK, changing the problem a little bit. Chances are there's other stuff on the disk that you don't need, but you don't know what it is or where it is. If you could find it, you could delete it, and then maybe you'd have enough extra space.

To find these "tumors", whether a few big ones, or lots of little ones, I use a little sampling program. Starting from the top of a directory (or the root) it makes two passes. In pass 1, it walks the directory tree, adding up the sizes of all the files to get a total of N bytes. In pass 2, it again walks the directory tree, pretending it is reading every file. Every time it passes N/20 bytes, it prints out the directory path and name of the file it is "reading". So the end result is 20 deep samples of path names uniformly spread over all the bytes under the directory.

Then just look at that list for stuff that shows up a lot that you don't need, and go blow it away.

(It's the space-equivalent of the sampling method I use for performance optimization.)

查看更多
劫难
6楼-- · 2019-03-16 01:47

With those constraints I expect you'd need to tamper with the file system; directly edit the file size and allocation blocks.

In other words, forget about shuffling any blocks of file content around, just edit the information about those files.

查看更多
何必那么认真
7楼-- · 2019-03-16 01:49

if the file is highly compressible (ie. logs):

gzip file1

gzip file2

zcat file1 file2 | gzip > file3

rm file1

rm file2

gunzip file3
查看更多
登录 后发表回答