In MongoDB mapreduce, how can I flatten the values

I'm trying to use MongoDB to analyse Apache log files. I've created a receipts collection from the Apache access logs. Here's an abridged summary of what my models look like:

db.receipts.findOne()
{
    "_id" : ObjectId("4e57908c7a044a30dc03a888"),
    "path" : "/videos/1/show_invisibles.m4v",
    "issued_at" : ISODate("2011-04-08T00:00:00Z"),
    "status" : "200"
}

I've written a MapReduce function that groups all data by the issued_at date field. It summarizes the total number of requests, and provides a breakdown of the number of requests for each unique path. Here's an example of what the output looks like:

db.daily_hits_by_path.findOne()
{
    "_id" : ISODate("2011-04-08T00:00:00Z"),
    "value" : {
        "count" : 6,
        "paths" : {
            "/videos/1/show_invisibles.m4v" : {
                "count" : 2
            },
            "/videos/1/show_invisibles.ogv" : {
                "count" : 3
            },
            "/videos/6/buffers_listed_and_hidden.ogv" : {
                "count" : 1
            }
        }
    }
}

How can I make the output look like this instead:

{
    "_id" : ISODate("2011-04-08T00:00:00Z"),
    "count" : 6,
    "paths" : {
        "/videos/1/show_invisibles.m4v" : {
            "count" : 2
        },
        "/videos/1/show_invisibles.ogv" : {
            "count" : 3
        },
        "/videos/6/buffers_listed_and_hidden.ogv" : {
            "count" : 1
        }
    }
}

标签： mongodb mapreduce

7条回答

你好瞎i

2楼-- · 2019-01-13 10:45

AFAIK, by design Mongo's map reduce will spit results out in "value tuples" and I haven't seen anything that will configure that "output format". Maybe the finalize() method can be used.

You could try running a post-process that will reshape the data using

results.find({}).forEach( function(result) {
  results.update({_id: result._id}, {count: result.value.count, paths: result.value.paths})
});

Yep, that looks ugly. I know.

0人赞添加讨论(0) 举报

ゆ、 Hurt°

3楼-- · 2019-01-13 10:48

You can do Dan's code with a collection reference:

    function clean(collection) { 
      collection.find().forEach( function(result) {
      var value = result.value;
      delete value._id;     
      collection.update({_id: result._id}, value);     
      collection.update({_id: result.id}, {$unset: {value: 1}} ) } )};

0人赞添加讨论(0) 举报

趁早两清

4楼-- · 2019-01-13 10:50

A similar approach to that of @ljonas but no need to hardcode document fields:

db.results.find().forEach( function(result) {
    var value = result.value;
    delete value._id;
    db.results.update({_id: result._id}, value);
    db.results.update({_id: result.id}, {$unset: {value: 1}} )
} );

0人赞添加讨论(0) 举报

何必那么认真

5楼-- · 2019-01-13 10:52

It's not currently possible, but I would suggest voting for this case: https://jira.mongodb.org/browse/SERVER-2517.

0人赞添加讨论(0) 举报

聊天终结者

6楼-- · 2019-01-13 10:55

Taking the best from previous answers and comments:

db.items.find().hint({_id: 1}).forEach(function(item) {
    db.items.update({_id: item._id}, item.value);
});

From http://docs.mongodb.org/manual/core/update/#replace-existing-document-with-new-document
"If the update argument contains only field and value pairs, the update() method replaces the existing document with the document in the update argument, except for the _id field."

So you need neither to $unset value, nor to list each field.

From https://docs.mongodb.com/manual/core/read-isolation-consistency-recency/#cursor-snapshot "MongoDB cursors can return the same document more than once in some situations. ... use a unique index on this field or these fields so that the query will return each document no more than once. Query with hint() to explicitly force the query to use that index."

0人赞添加讨论(0) 举报

【Aperson】

7楼-- · 2019-01-13 10:57

All the proposed solutions are far from optimal. The fastest you can do so far is something like:

var flattenMRCollection=function(dbName,collectionName) {
    var collection=db.getSiblingDB(dbName)[collectionName];

    var i=0;
    var bulk=collection.initializeUnorderedBulkOp();
    collection.find({ value: { $exists: true } }).addOption(16).forEach(function(result) {
        print((++i));
        //collection.update({_id: result._id},result.value);

        bulk.find({_id: result._id}).replaceOne(result.value);

        if(i%1000==0)
        {
            print("Executing bulk...");
            bulk.execute();
            bulk=collection.initializeUnorderedBulkOp();
        }
    });
    bulk.execute();
};

Then call it: flattenMRCollection("MyDB","MyMRCollection")

This is WAY faster than doing sequential updates.

0人赞添加讨论(0) 举报

1 2 下一页

In MongoDB mapreduce, how can I flatten the values

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间