How can I limit the cache used by copying so there

2019-03-18 00:54发布

Basic situation:

I am copying some NTFS disks in openSuSE. Each one is 2TB. When I do this, the system runs slow.

My guesses:

I believe it is likely due to caching. Linux decides to discard useful cache (eg. kde4 bloat, virtual machine disks, LibreOffice binaries, Thunderbird binaries, etc.) and instead fill all available memory (24 GB total) with stuff from the copying disks, which will be read only once, then written and never used again. So then any time I use these apps (or kde4), the disk needs to be read again, and reading the bloat off the disk again makes things freeze/hiccup.

Due to the cache being gone and the fact that these bloated applications need lots of cache, this makes the system horribly slow.

Since it is USB,the disk and disk controller are not the bottleneck, so using ionice does not make it faster.

I believe it is the cache rather than just the motherboard going too slow, because if I stop everything copying, it still runs choppy for a while until it recaches everything. And if I restart the copying, it takes a minute before it is choppy again. But also, I can limit it to around 40 MB/s, and it runs faster again (not because it has the right things cached, but because the motherboard busses have lots of extra bandwidth for the system disks). I can fully accept a performance loss from my motherboard's IO capability being completely consumed (which is 100% used, meaning 0% wasted power which makes me happy), but I can't accept that this caching mechanism performs so terribly in this specific use case.

# free
             total       used       free     shared    buffers     cached
Mem:      24731556   24531876     199680          0    8834056   12998916
-/+ buffers/cache:    2698904   22032652
Swap:      4194300      24764    4169536

I also tried the same thing on Ubuntu, which causes a total system hang instead. ;)

And to clarify, I am not asking how to leave memory free for the "system", but for "cache". I know that cache memory is automatically given back to the system when needed, but my problem is that it is not reserved for caching of specific things.

Question:

Is there some way to tell these copy operations to limit memory usage so some important things remain cached, and therefore any slowdowns are a result of normal disk usage and not rereading the same commonly used files? For example, is there a setting of max memory per process/user/file system allowed to be used as cache/buffers?

7条回答
SAY GOODBYE
2楼-- · 2019-03-18 01:34

Kristof Provost was very close, but in my situation, I didn't want to use dd or write my own software, so the solution was to use the "--drop-cache" option in rsync.

I have used this many times since creating this question, and it seems to fix the problem completely. One exception was when I am using rsync to copy from a FreeBSD machine, which doesn't support "--drop-cache". So I wrote a wrapper to replace the /usr/local/bin/rsync command, and remove that option, and now it works copying from there too.

It still uses huge amount of memory for buffers, and seems to keep almost no cache, but works smoothly anyway.

$ free
             total       used       free     shared    buffers     cached
Mem:      24731544   24531576     199968          0   15349680     850624
-/+ buffers/cache:    8331272   16400272
Swap:      4194300     602648    3591652
查看更多
Juvenile、少年°
3楼-- · 2019-03-18 01:35

try using dd instead of cp .

Or mount the filesystem with the sync flag.

I'm not completely sure if these methods bypass the swap, but it may be worth giving a try.

Just my 2c.

查看更多
走好不送
4楼-- · 2019-03-18 01:43

I am copying some NTFS disks [...] the system runs slow. [...] Since it is USB [...]

The slowdown is a known memory management issue.

Use a newer Linux Kernel. The older ones have a problem with USB data and "Transparent Huge Pages". See this LWN article. Very recently this issue was addressed, see "Memory Management" in LinuxChanges.

查看更多
Ridiculous、
5楼-- · 2019-03-18 01:46

Ok, now that i know that your using rsync and i could dig a bit more:

It seems that rsync is ineffective when used with tons of files at the same time, there's an entry in their FAQ, it's not a linux/cache problem, it's an rsync problem eating too much RAM.

Googling around someone recommended to split the syncing in multiple rsync invocations

Hope it helps.

查看更多
Viruses.
6楼-- · 2019-03-18 01:49

It's not possible if you're using plain old cp, but if you're willing to re-implement or patch it yourself setting posix_fadvise(fd, 0, 0, POSIX_FADV_NOREUSE) on both input and output file will probably help.

posix_fadvise() tells the kernel about your intended access pattern. In this case, you'd only use the data once so there's no point in caching it. The Linux kernel honours these flags, so shouldn't be caching the data any more.

查看更多
干净又极端
7楼-- · 2019-03-18 01:50

The kernel can not know, that you won't use the cached data from copying agains. This is your information advantage.

But you could set the swapiness to 0: sudo sysctl vm.swappiness=0. This will cause linux to drop the cache before libraries etc. are written to the swap.

Works nice for me too, especially very performant in combination with hugh ram (16-32 GB).

查看更多
登录 后发表回答