Riak Map Reduce in JS returning limited data

So I have Riak running on 2 EC2 servers, using python to run javascript Mapreduce. They have been clustered. Mainly used for "proof of concept".

There are 50 keys in the bucket, all the map/reduce function does is re-format the data. This is only for testing the map/reduce functionality in Riak.

Problem: The output only shows [{u'e': 2, u'undefined': 2, u'w': 2}]. That is completely wrong. The logs show that all the keys have "processed" but only 2 get returned. So my question is why is that happening and am I missing something important.

Code:

import riak
client = riak.RiakClient()
query = riak.RiakMapReduce(client).add('raw_hits10')
query.map("""function(v) {
      var data = JSON.parse(v.values[0].data);
      return [[data, 1]];
}""")
query.reduce("""function(vk) {
         var res = {};
         for (var indx in vk) {
            var key_t = vk[indx][0];
            var val_t = vk[indx][1];
            ejsLog('/tmp/map_reduce.log', key_t + "--- " + val_t);

            res[key_t] = 2;
         }
         return [res]
    }
      """)


for res in query.run():
    print res

The results from printing:

[{u'e': 2, u'undefined': 2, u'w': 2}]

This makes no sense

标签： javascript mapreduce riak

1条回答

爷、活的狠高调

2楼-- · 2019-07-19 03:38

In order to avoid having to load all data from the preceding phase into memory on the coordinating node before running the reduce phase (which would be problematic for large mapreduce jobs), the reduce function is run multiple times. Every iteration gets a batch of results from preceding phase together with any output from earlier reduce phase iteration(s). The default batch size is 20, but this is configurable. As the results from one reduce phase iteration will be fed in as input to the next iteration, reduce phase functions need to designed to handle this, and some strategies are described here.

It is also possible to force Riak to only run the reduce phase once for the entire input set by specifying the 'reduce_phase_only_1' parameter, but this is generally not recommended, especially for large jobs.

0人赞添加讨论(0) 举报

Riak Map Reduce in JS returning limited data

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间