Riak database fails after a short period

2019-05-29 02:17发布

问题:

I crerated a simple erlang application which periodically collects required data and puts it in a riak database.

As I start my application it runs smoothly.. but after a period of time it stucks as PUT requests to riak database becomes too slow.. It is logs from my app:

2013-06-26 12:44:09.090 [info] <0.60.0> data processed in [16476 ms]
2013-06-26 12:45:51.472 [info] <0.60.0> data processed in [18793 ms]
...
2013-06-26 12:57:28.138 [info] <0.60.0> data processed in [15135 ms]
2013-06-26 13:07:01.484 [info] <0.60.0> data processed in [488420 ms]
2013-06-26 14:03:11.561 [info] <0.60.0> data processed in [3370075 ms]

In riak crash logs I can see a lot of messages like

2013-06-26 17:06:20 =CRASH REPORT====
crasher:
initial call: riak_kv_index_hashtree:init/1
pid: <0.13660.7>
registered_name: []
exception exit: {{{badmatch,{error,{db_open,"IO error: ./data/anti_entropy/
    433883298582611803841718934712646521460354973696/MANIFEST-000004: 
    Cannot allocate memory"}}}, [{hashtree,new_segment_store,2,
    [{file,"src/hashtree.erl"},{line,499}]},
    {hashtree,new,2,[{file,"src/hashtree.erl"},{line,215}]},
    {riak_kv_index_hashtree,do_new_tree,2,
    [{file,"src/riak_kv_index_hashtree.erl"},
    {line,426}]},{lists,foldl,3,[{file,"lists.erl"},
    {line,1197}]},{riak_kv_index_hashtree,
    init_trees,2,[{file,"src/riak_kv_index_hashtree.erl"},
    {line,368}]},{riak_kv_index_hashtree,init,1,
    [{file,"src/riak_kv_index_hashtree.erl"},
    {line,225}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},
    {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},
    [{gen_server,init_it,6,[{file,"gen_server.erl"},{line,328}]},
    {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
ancestors: [<0.955.0>,riak_core_vnode_sup,riak_core_sup,<0.129.0>]
messages: []
links: []
dictionary: []
trap_exit: false
status: running
heap_size: 1597
stack_size: 24
reductions: 593
neighbours:

I can see the same behavior on Amazon AWS and local virtual machine. my VM's are quite small 512-1024 mb.. AWS is Micro, so it has the same amount of memory.

There are no cluster currently. Just single node with Riak and my app running on it.

I've checked riak documentation and basic things they recommend to do is to increase ulimit and to update sysctl. So, my server ulimit shows: ulimit -n 65536 AND sysctl updated as recommended.

I've tried bitcask and eleveldb, but result is the same.

Currently, I can't figure out what is broken and why riak Cannot allocate memory.. Thanks.

回答1:

1GB RAM is quite small for a Riak node, and even more so as you are also running your application there. The default settings in Riak are targeted at environments with considerably more RAM and processing power, so you will need to tweak the default settings in order to get it to work. Here are a few pointers that may help:

  1. As you only have one node, disable AAE by setting {anti_entropy, {off, []}}.
  2. Reduce the ring size. This will limit your ability to scale out, but is most likely required in order to get it to work. A suitable starting value could perhaps be 16, but possibly even as low as 8.
  3. Change default bucket properties so that you have n_val, r, w, dw and rw all set to 1, as you otherwise will be writing multiple copies of every record to disk. These will need to be increased when you scale out and add more nodes.
  4. As bitcask requires all keys to be kept in memory, it is probably a good idea to instead use leveldb as a backend. You will however most likely need to reduce the size of write buffers as well as the cache significantly. You may need to experiment to find a suitable level.
  5. As this environment most likely is too small to be able to run mapreduce on anyway, you can also set the map_js_vm_count and reduce_js_vm_count configuration parameters to 0 in order to save some additional memory.


标签: erlang riak