Need to know pros and cons of using RAMDirectory

2019-03-19 16:24发布

I need to improve performance of my Lucene search query. Can I use RAMDirectory?Does it optimize performance?Is there any index size limit for this? I would appreciate if someone could list pros and cons of using a RAMDirectory.

Thanks.

3条回答
ら.Afraid
2楼-- · 2019-03-19 16:29

A RAMDirectory is faster, but doesn't get written to the disk. It only exists as long as your program is running, and has to be created from scratch every time your program runs.

If your index is small enough to fit comfortably into RAM, and you don't update it frequently, you can maintain an index on the disk and then create a RAMDirectory from it using the RAMDirectory(Directory dir) constructor. Querying that should then be faster than querying the one on disk, once you've paid the penalty of loading it up. But do measure the difference - if the index can fit into memory as a RAMDirectory, then it can fit in the disk cache as well, so you might not see much difference.

查看更多
Ridiculous、
3楼-- · 2019-03-19 16:36

You should profile the use of RAMDirectory. At least in Linux, using RAMDirectory is not any faster than using the default FSDirectory, due to the way the OS buffers I/O.

查看更多
女痞
4楼-- · 2019-03-19 16:41

I compare FSDirectory and RAMDirectory.

  • index size is 1.4G
  • Centos, 5G memory

Search 1000 keywords, the average/min/max response time (ms) is here

  • FSDirectory
    • first run: 351/7/2611
    • second run: 47/7/837
    • third run(restart app): 53/7/2343
  • RAMDirectory
    • first run: 38/7/1133
    • second run: 34/7/189
    • third run(restart app): 38/7/959

So, you can see RAMDirectory is do faster then FSDirectory, but after 'os file cache warm up', the speed gap is not so distinct. What's the disadvantage of RMADirectory? In my test

  • It eats much more memory, 1.4G file need about 2G to load it into memory. while FSDirectory uses only 700m. Then it means longer time for full gc.
  • It need more time to load, especially when the index file is large. It need copy the data from file to memory when opening the index. That means requests would be blocked for more time when restart app.
  • It's not so practical to maintain two index in the same time. Because our app switches index every several hours. We want new index is warming up while old index is still working in the same tomcat.
查看更多
登录 后发表回答