Remove beginning of file without rewriting the who

2020-02-25 07:24发布

问题:

I have an embedded Linux system, that stores data in a very large file, appending new data to the end. As the file size grows near filling available storage space, I need to remove oldest data.

Problem is, I can't really accept the disruption it would take to move the massive bulk of data "up" the file, like normal - lock the file for an extended period of time just to rewrite it (plus this being a flash medium, it would cause unnecessary wear to the flash).

Probably the easiest way would be to split the file into multiple smaller ones, but this has several downsides related to how the data is handled and processed - all the 'client end' software expects single file. OTOH it can handle 'corruption' of having the first record cut in half, so the file doesn't need to be trimmed at record offsets, just 'somewhere up there', e.g. first few iNodes freed. Oldest data is obsolete anyway so even more severe corruption of the beginning of the file is completely acceptable, as long as the 'tail' remains clean, and liberties can be taken with how much exactly is removed - 'roughly several first megabytes' is okay, no need for 'first 4096KB exactly' precision.

Is there some method, API, trick, hack to truncate beginning of file like that?

回答1:

You can achieve the goal with Linux kernel v3.15 above for ext4/xfs file system.

int ret = fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, 0, 4096);

See here Truncating the first 100MB of a file in linux



回答2:

The easiest solution for your old applications would be a FUSE filesystem which gives them access to the underlying file, but with the offset cyclically shifted. This would allow you to implement a ringbuffer at the physical level. The FUSE layer would be fairly trivial as it only needs to adjust all filepositions by a constant, modulo filesize.



回答3:

What about setting up a separate process that renames the output file when it reaches a predefined size (for instance by adding the linux time at the end of the file name).

This would allow you to keep the old data and the main process will recreate the output file the next time it writes to it.

Another cron job may remove the old file every now and then.