Search on multiple collections in MongoDB

2019-01-11 01:03发布

问题:

I know the theory of MongoDB and the fact that is doesn't support joins, and that I should use embeded documents or denormalize as much as possible, but here goes:

I have multiple documents, such as:

  • Users, which embed Suburbs, but also has: first name, last name
  • Suburbs, which embed States
  • Child, which embeds School, belongs to a User, but also has: first name, last name

Example:

Users:
{ _id: 1, first_name: 'Bill', last_name: 'Gates', suburb: 1 }
{ _id: 2, first_name: 'Steve', last_name: 'Jobs', suburb: 3 }

Suburb:
{ _id: 1, name: 'Suburb A', state: 1 }
{ _id: 2, name: 'Suburb B', state: 1 }
{ _id: 3, name: 'Suburb C', state: 3 }

State:
{ _id: 1, name: 'LA' }
{ _id: 3, name: 'NY' }

Child:
{ _id: 1, _user_id: 1, first_name: 'Little Billy', last_name: 'Gates' }
{ _id: 2, _user_id: 2, first_name: 'Little Stevie', last_name: 'Jobs' }

The search I need to implement is on:

  • first name, last name of Users and Child
  • State from Users

I know that I have to do multiple queries to get it done, but how can that be achieved? With mapReduce or aggregate?

Can you point out a solution please?

I've tried to use mapReduce but that didn't get me to have documents from Users which contained a state_id, so that's why I brought it up here.

回答1:

This answer is outdated. Since version 3.2, MongoDB has limited support for left outer joins with the $lookup aggregation operator

MongoDB does not do queries which span multiple collections - period. When you need to join data from multiple collections, you have to do it on the application level by doing multiple queries.

  1. Query collection A
  2. Get the secondary keys from the result and put them into an array
  3. Query collection B passing that array as the value of the $in-operator
  4. Join the results of both queries programmatically on the application layer

Having to do this should be rather the exception than the norm. When you frequently need to emulate JOINs like that, it either means that you are still thinking too relational when you design your database schema or that your data is simply not suited for the document-based storage concept of MongoDB.



回答2:

You'll find MongoDB easier to understand if you take a denormalized approach to schema design. That is, you want to structure your documents the way the requesting client application understands them. Essentially, you are modeling your documents as domain objects with which the applicaiton deals. Joins become less important when you model your data this way. Consider how I've denormalized your data into a single collection:

{  
    _id: 1, 
    first_name: 'Bill', 
    last_name: 'Gates', 
    suburb: 'Suburb A',
    state: 'LA',
    child : [ 3 ]
}

{ 
    _id: 2, 
    first_name: 'Steve', 
    last_name: 'Jobs', 
    suburb: 'Suburb C',
    state 'NY',
    child: [ 4 ] 
}
{ 
    _id: 3, 
    first_name: 'Little Billy', 
    last_name: 'Gates',
    suburb: 'Suburb A',
    state: 'LA',
    parent : [ 1 ]
}

{
    _id: 4, 
    first_name: 'Little Stevie', 
    last_name: 'Jobs'
    suburb: 'Suburb C',
    state 'NY',
    parent: [ 2 ]
}

The first advantage is that this schema is far easier to query. Plus, updates to address fields are now consistent with the individual Person entity since the fields are embedded in a single document. Notice also the bidirectional relationship between parent and children? This makes this collection more than just a collection of individual people. The parent-child relationships mean this collection is also a social graph. Here are some resoures which may be helpful to you when thinking about schema design in MongoDB.



回答3:

Here's a JavaScript function that will return an array of all records matching specified criteria, searching across all collections in the current database:

function searchAll(query,fields,sort) {
    var all = db.getCollectionNames();
    var results = [];
    for (var i in all) {
        var coll = all[i];
        if (coll == "system.indexes") continue;
        db[coll].find(query,fields).sort(sort).forEach(
            function (rec) {results.push(rec);} );
    }
    return results;
}

From the Mongo shell, you can copy/paste the function in, then call it like so:

> var recs = searchAll( {filename: {$regex:'.pdf$'} }, {moddate:1,filename:1,_id:0}, {filename:1} ) > recs



回答4:

So now join is possible in mongodb and you can achieve this using $lookup and $facet aggregation here and which is probably the best way to find in multiple collections

db.collection.aggregate([
  { "$limit": 1 },
  { "$facet": {
    "c1": [
      { "$lookup": {
        "from": Users.collection.name,
        "pipeline": [
          { "$match": { "first_name": "your_search_data" } }
        ],
        "as": "collection1"
      }}
    ],
    "c2": [
      { "$lookup": {
        "from": State.collection.name,
        "pipeline": [
          { "$match": { "name": "your_search_data" } }
        ],
        "as": "collection2"
      }}
    ],
    "c3": [
      { "$lookup": {
        "from": State.collection.name,
        "pipeline": [
          { "$match": { "name": "your_search_data" } }
        ],
        "as": "collection3"
      }}
    ]
  }},
  { "$project": {
    "data": {
      "$concatArrays": [ "$c1", "$c2", "$c3" ]
    }
  }},
  { "$unwind": "$data" },
  { "$replaceRoot": { "newRoot": "$data" } }
])


回答5:

Based on @brian-moquin and others, I made a set of functions to search entire collections with entire keys(fields) by simple keyword.

It's in my gist; https://gist.github.com/fkiller/005dc8a07eaa3321110b3e5753dda71b

For more detail, I first made a function to gather all keys.

function keys(collectionName) {
    mr = db.runCommand({
        'mapreduce': collectionName,
        'map': function () {
            for (var key in this) { emit(key, null); }
        },
        'reduce': function (key, stuff) { return null; },
        'out': 'my_collection' + '_keys'
    });
    return db[mr.result].distinct('_id');
}

Then one more to generate $or query from keys array.

function createOR(fieldNames, keyword) {
    var query = [];
    fieldNames.forEach(function (item) {
        var temp = {};
        temp[item] = { $regex: '.*' + keyword + '.*' };
        query.push(temp);
    });
    if (query.length == 0) return false;
    return { $or: query };
}

Below is a function to search a single collection.

function findany(collection, keyword) {
    var query = createOR(keys(collection.getName()));
    if (query) {
        return collection.findOne(query, keyword);
    } else {
        return false;
    }
}

And, finally a search function for every collections.

function searchAll(keyword) {
    var all = db.getCollectionNames();
    var results = [];
    all.forEach(function (collectionName) {
        print(collectionName);
        if (db[collectionName]) results.push(findany(db[collectionName], keyword));
    });
    return results;
}

You can simply load all functions in Mongo console, and execute searchAll('any keyword')