MongoDB: How to index unknown fields

2019-08-23 11:47发布

问题:

We're storing documents with a undefined structure. I mean, it has a basic structure (id, user and creationTimestamp), but is also there a Map<String, Object> values fields, where we are able to store whichever structure:

public class Metadata {
    private String id;
    private String user;
    private Date creationTimestamp;
    private Map<String, Object> values;
}

Example:

> db.metadata.find();
{
    "_id" : "Doc2Ref2Mdt1",
    "user" : "user1",
    "creationTimestamp" : ISODate("2018-09-24T12:20:56.958Z"),
    "values" : {
        "ambit" : "ctti",
        "departament" : "economia"
    }
},
{
    "_id" : "Doc1Ref2Mdt1",
    "user" : "user2",
    "creationTimestamp" : ISODate("2018-09-24T12:20:56.169Z"),
    "values" : {
        "date" : ISODate("2018-09-24T12:20:56.171Z"),
        "number" : 16,
        "address" : {
        "street" : "Av. Diagonal",
        "location" : "barcelona",
        "credentials" : [
        {
            "password" : "pwd",
            "login" : "main"
        },
        {
            "password" : "pwd",
            "login" : "other",
            "creation" : ISODate("2018-09-24T12:20:56.171Z")
        }],
        "contact" : "contact name",
        "tags" : ["tag1", "tag2"}]
    }
}

So, you can see values can store any structure.

I need to know if mongodb is able to automatically index all of them.

I mean, when a new field is "added" into values, for example, values.newfield it is indexed automatically.

Any ideas?

回答1:

you can create an index on a subdocument, and new field will be added automatically,

BUT

To use this index with your queries, you must provide as parameter complete and ordered document.

Example, with your sample :

db.metadata.createIndex({"values",1});

db.metadata.find("values.ambit":"ctti")

==> Will NOT use the index, but will return first document.

db.metadata.find(values:{ambit:"ctti"})

==> Will use the index, but return no document.

db.metadata.find(values:{"departament" : "economia", ambit:"ctti"})

==> Will use the index, but return no document, due to different fields order.

db.metadata.find(values:{ambit:"ctti", "departament" : "economia"})

==> Will use the index, and return return first document.



回答2:

Not possible in the way you want it.

You can try text index on all fields followed by the actual query. The text search alone may result with false-positive matches, but aggregated with the normal query it will reduce the dataset to scan and in most cases will speed it up.

There are few limitations to keep in mind:

  • only string fields will be indexed, e.g. .find({ $text: { $search: "2018-09-24" } }) won't return anything. Neither will .find({ $text: { $search: "16" } })
  • only full words queries, i.e. no regexps. You will need to use language: "none" to keep stopwords and don't use steming. Queries like .find({ $text: { $search: "barcel" } }) won't find the second document.

The first limitation can be worked around by serialisation of values map to a string field on write, if you control all writes. Then you will need to create a text index on this field only.