JSON Schema with dynamic key field in MongoDB

2019-04-19 08:46发布

Want to have a i18n support for objects stored in mongodb collection

currently our schema is like:

{
  _id: "id"
  name: "name"
  localization: [{
    lan: "en-US",
    name: "name_in_english"
  }, {
    lan: "zh-TW",
    name: "name_in_traditional_chinese"
  }]
}

but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be

{
  _id: "id"
  name: "name"
  localization: {
    "en-US": "name_in_english",
    "zh-TW": "name_in_traditional_chinese"
  }
}

which would be neater and easier to parse (just localization[language] would get the value i want for specific language).

But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?

3条回答
虎瘦雄心在
2楼-- · 2019-04-19 09:12

Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).

Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).

My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).

As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.

查看更多
神经病院院长
3楼-- · 2019-04-19 09:21

There is a case where an object is necessarily better than an array: supporting upserts into a set. For example, if you want to update an item having name 'item1' to have val 100, or insert such an item if one doesn't exist, all in one atomic operation. With an array, you'd have to do one of two operations. Given a schema like

{ _id: 'some-id', itemSet: [ { name: 'an-item', val: 123 } ] }

you'd have commands

// Update:
db.coll.update(
  { _id: id, 'itemSet.name': 'item1' },
  { $set: { 'itemSet.$.val': 100 } }
);

// Insert:
db.coll.update(
  { _id: id, 'itemSet.name': { $ne: 'item1' } },
  { $addToSet: { 'itemSet': { name: 'item1', val: 100 } } }
);

You'd have to query first to know which is needed in advance, which can exacerbate race conditions unless you implement some versioning. With an object, you can simply do

db.coll.update({
  { _id: id },
  { $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 } }
});

If this is a use case you have, then you should go with the object approach. One drawback is that querying for a specific name requires scanning. If that is also needed, you can add a separate array specifically for indexing. This is a trade-off with MongoDB. Upserts would become

db.coll.update({
  { _id: id },
  { 
    $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 },
    $addToSet: { itemNames: 'item1' } 
  }
});

and the query would then simply be

db.coll.find({ itemNames: 'item1' })

(Note: the $ positional operator does not support array upserts.)

查看更多
Emotional °昔
4楼-- · 2019-04-19 09:36

It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.

However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).

I would suggest an alternative schema:

{
    your_id: "id_for_name"
    lan: "en-US",
    name: "name_in_english"
}
{
    your_id: "id_for_name"
    lan: "zh-TW",
    name: "name_in_traditional_chinese"
}

Now you can :

  • set an index on { your_id: 1, lan: 1 } for speedy lookups
  • query for each translation individually and just get that translation:
    db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
  • query for all the versions for each id using this same index:
    db.so.find( { your_id: "id_for_name" } )
  • and also much easier update the translation for a specific language:

    db.so.update(
        { your_id: "id_for_name", lan: 'en-US' }, 
        { $set: { name: "ooga" } } 
    )
    

Neither of those points are possible with your suggested schemas.

查看更多
登录 后发表回答