MapReduce aggregation based on attributes containe

2019-08-07 09:10发布

Say I have a collection of 'activities', each of which has a name, cost and location:

{_id : 1 , name: 'swimming', cost: '3.40', location: 'kirkstall'}
{_id : 2 , name: 'cinema', cost: '6.50', location: 'hyde park'}
{_id : 3 , name: 'gig', cost: '10.00', location: 'hyde park'}

I also have a people collection which records, for each activity, how many times they plan to do each in a year:

{_id : 1 , name: 'russell', activities : { {1 : 9} , {2 : 4} , {3 : 21} }}

I don't want to denormalise the activities' attributes by putting them in the person collection for a number of reasons.

First of all, this is about planning, so if the cost of an activity changes, it would need to change in the person collection too. So I'd have to update all person records.

Secondly, I will probably want to add some other attributes to the activity collection at some point, and want to avoid having to add them to every activity in every record in the person collection when I do.

However, now I want to do a MapReduce to find out how many activities are planned in total by all people, grouped by location.

This means that during a MapReduce on the person collection I need to know the location of the activities they have planned. Can anyone think of a nice way to do this?

My best shot at the moment (which is pretty rubbish) is creating a stored javascript function that accepts an array of activity_ids, queries the activity collection, and returns a map of activity_id to location. I'd then stick this in the map function and lookup locations from it. This would be pretty rubbish though as I've said as the same query on the activities collection would be run once for every item in the people collection.

1条回答
Fickle 薄情
2楼-- · 2019-08-07 09:55

I did this by wrapping the MapReduce in some stored javascript.

function (query) {

  var one = db.people.findOne(query);
  var activity_ids = [];
  for (var k in one.activities){
    activity_ids.push(parseInt(k));
  }

  var activity_location_map = {};
  db.activities.find({id : {$in : activity_ids}}).forEach(function(a){
    activity_location_map[a.id] = a.location;
  });


  return db.people.mapReduce(
    function map(){
      for (var k in this.activities){
        emit({location : activity_location_map[k]} , { total: this.activities[k] });
        emit({location: activity_location_map[k]} , { total: this.activities[k] });
      }
    },
    function reduce(key, values){
      var reduced = {total: 0};
      values.forEach(function(value){
        reduced.total += value.total;
      });

      return reduced;
    },
    {out : {inline: true}, scope : { activity_location_map : activity_location_map }}
  ).results;
}

Annoying, and messy, but it works, and I can't think of owt better.

查看更多
登录 后发表回答