Join operation with MongoDB MapReduce

I've been using MapReduce before to perform classical MR operation, the equivalent of GROUP BY in SQL.

I was wondering if it would be conceptually possible to perform a JOIN operation with MapReduce. Any idea how that could be implemented? Does it make sense to use MapReduce for this kind of operation?

Thanks!

标签： mongodb mapreduce

2条回答

一纸荒年 Trace。

2楼-- · 2020-07-11 06:37

MongoDB doesn't support relational operations likes joins. Instead, you can denormalise your data by embedding the rows you'd JOIN on inside the outer document. So instead of joining Products to Sales, you could have a products collection with this schema:

products

{
    _id: 123,
    name: "Widget",
    price: 9.99
    sales:
    [ 
        { id:1, date: "20100316", howMany: 2 },
        { id:2, date: "20100316", howMany: 5 }
    ]
}

Then whenever you retrieve a product, you also get its sales data so there's no need to join or lookup the info somewhere else.

Alternatively, you could split into two collections as you might with a relational database, then use an additional query to get a product's sales, something like this:

SQL: SELECT Sales WHERE ProductId = 123

MongoDB: db.sales.find( { productid: 123 } )

products

{
    _id: 123,
    name: "Widget",
    price: 9.99
}

sales

{
    id: 1,
    productid: 123,
    date: "20100316",
    howMany: 2 
}

{
    id: 2,
    productid: 123,
    date: "20100316",
    howMany: 5
}

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2020-07-11 06:38

My approach is below :

having a look to hadoop I have find CompositeInputFormat approach brefily, it takes two or more collections as an input for map-reduce job

according to my investigation mongodb dont provide this yet. mongodb mapReduce is performed on one colletion at a time.(please correct if I am worng)

so I have decided to put the collections that need to be joined in one collection on wich I will perform the mapreduce for "sql right join"

this is from my log reporter project. the first phase map-reduce is enough to perform right join in case "no clock". the second phase map-reduce has the aim to exclude superfluous right join caused by clock field.

db.test.drop();
db.test.insert({"username" : 1, "day" : 1, "clock" : 0 });
db.test.insert({"username" : 1, "day" : 1, "clock" : 1 });
db.test.insert({"username" : 1,  startDay : 1,endDay:2, "table" : "user" });

//startDay : 1,endDay:2 are used to define the employers working day (join to company - left the company)
//you can use an array instedad of array here. for example day:[1,2,3, ...]

m1 = function(){
   if( typeof this.table!= "undefined" && this.table!=null){
       username = this.username;
       startDay = this.startDay;
       endDay   = this.endDay;
       while(startDay<=endDay){
           emit({username:username,day:startDay},{clocks:["join"]});
          // emit({username:username,day:startDay},1);
           startDay++;
       }
   }else{
       emit({username:this.username,day:this.day},{clocks:[this.clock]});
   }
}
r1 = function(key,values){
    result = {clocks:[]}
    values.forEach(function(x){
        result.clocks = x.clocks.concat(result.clocks);
        result.clocks=result.clocks.filter(function(element, index, array){
            return element!="join";            
        })
    })
    return result;
}

db.test.mapReduce(m1,r1,{out:"result1"})
db.test.find();
db.result1.find();

m2=function(){
   key=this._id;
   this.value.clocks.forEach(function(x){
       key.clock=x;
       emit(key,1);       
   })   
}
r2 = function(key,values){
    value=0;
    values.forEach(function(x){
        value+=1;      
    })
    return result;
}

db.result1.mapReduce(m2,r2,{out:"result2"})
db.test.find();
db.result2.find();

0人赞添加讨论(0) 举报

Join operation with MongoDB MapReduce

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间