How to make use of the filesystem cache in Java or

A recent blog post on Elasticsearch website is talking about the features of their new 1.4 beta release.

I am very curious about how they make use of the filesystem cache:

Recent releases have added support for doc values. Essentially, doc values provide the same function as in-memory fielddata, but they are written to disk at index time. The benefit that they provide is that they consume very little heap space. Doc values are read from disk, instead of from memory. While disk access is slow, doc values benefit from the kernel’s filesystem cache. The filesystem cache, unlike the JVM heap, is not constrained by the 32GB limit. By shifting fielddata from the heap to the filesystem cache, you can use smaller heaps which means faster garbage collections and thus more stable nodes.

Before this release, doc values were significantly slower than in-memory fielddata. The changes in this release have improved the performance significantly, making them almost as fast as in-memory fielddata.

Does this mean that we can manipulate the behavior of filesystem cache instead of waiting for the effect from the OS passively? If it is the case, how can we make use of the filesystem cache in normal application developement? Say, if I'm writing a Python or Java program, how can I do this?

标签： java python performance caching elasticsearch

1条回答

不美不萌又怎样

2楼-- · 2020-05-24 07:11

File-system cache is an implementation detail related to OS inner workings that is transparent to the end user. It isn't something that needs adjustments or changes. Lucene already makes use of the file-system cache when it manages the index segments. Every time something is indexed into Lucene (via Elasticsearch) those documents are written to segments, which are first written to the file-system cache and then, after some time (when the translog - a way of keeping track of documents being indexed - is full for example) the content of the cache is written to an actual file. But, while the documents to be indexed are in file-system cache, they can still be accessed.

This improvement in doc values implementation refers to this feature as being able to use the file-system cache now, as they are read from disk, put in cache and accessed from there, instead of taking up Heap space.

How this file-system cache is being accessed is described in this excellent blog post:

In our previous approaches, we were relying on using a syscall to copy the data between the file system cache and our local Java heap. How about directly accessing the file system cache? This is what mmap does!

Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache.

Related to the actual means of using mmap in a Java program, I think this is the class and method to do so.

0人赞添加讨论(0) 举报

How to make use of the filesystem cache in Java or

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间