Searching in MongoDB

2019-02-27 11:00发布

问题:

Imagine you need to implement a search in MongoDB. You have collection of documents that look like this:

{text: "This is some Text }
{text: "this is another text hehe"}

Now you want to implement a case insensitive search that would return all the documents that contain search term. For example, if you search for "text", it would return both documents. If you search for "hehe", it would return only second document.

I know you can do this using $regex like this:

db.comments.find({text: {$regex: /.*SEARCH_TERM.*/i}});

Where SEARCH_TERM is a term we're looking for.

I'm wondering if there is a better way to do this because searching via regex seems like a bad idea. There is no indexing or anything this way.

My idea is that you could somehow tokenize that text in documents, so you would have documents like this:

{text: ["This", "is", "some", "Text"]}
{text: ["this", "is", "another", "text", "hehe"]}

and then index these arrays. Is there any better way to do this?

回答1:

Maybe the full-text-search is your answer http://docs.mongodb.org/manual/core/index-text/ http://docs.mongodb.org/manual/reference/operator/query/text/

Code snippets from these references:

1 - db.comments.ensureIndex( { comments: "text" } )

The following code searches for comments that contain the words This or another but do not contain the term hehe:

2- db.comments.find( { $text: { $search: "This another -hehe" } } )


回答2:

Might be fun to do a Map Reduce:

mapper=function(){
    var words=this.text.match(/\S+\s*/g);
    for (w in words){
        emit(this._id, {'words':words})
    }
}

reducer=function(k,v){return {'words':this[0].words}}

This should get you a collection with the words separated out. There's probably a way of doing this with aggregations.