Performance disadvantage using slug as primary key

2019-04-12 00:51发布

Let's take for example a blog post where a unique slug is generated from the post's title: sample_blog_post. Instead of storing a mongo ObjectId as the _id, say you store the slug in the _id. Besides the obvious case where the slug may change if the title changes, are there major disadvantages in terms of performance by using a string instead of a numerical _id? This could become problematic if, say, the number of posts became very large, say, over a million. But if the number of posts was relatively low, say, 2000, would it make much of a difference? So far the only thing about the ObjectId that I think I'd take advantage of is the created_on date the comes for free.

So in summation, is it worth it to store the slug as the _id and not use an ObjectId? There seems to be discussion on how to store alternate values as an _id, but not the performance advantages/disadvantages to it.

1条回答
看我几分像从前
2楼-- · 2019-04-12 01:32

So in summation, is it worth it to store the slug as the _id and not use an ObjectId?

In my opinion, no. The performance difference will be negligible for most scenarios (except paging), but

  • The old discussion of surrogate primary keys comes up. A "slug" is not a very natural key. Yes, it must be unique, but as you already pointed out, changing the slug shouldn't be impossible. This alone would keep me from bothering...
  • Having a monotonic _id key can save you from a number of headaches, most importantly to avoid expensive paging via skip and take (use $lt/$gt on the _id instead).
  • There's a limit on the maximum index length in mongodb of less than 1024 bytes. While not pretty, URLs are allowed to be a lot longer. If someone entered a longer slug, it wouldn't be found because it's silently dropped from the index.
  • It's a good idea to have a consistent interface, i.e. to use the same type of _id on all, or at least, most of your objects. In my code, I have a single exception where I'm using a special hash as id because the value can't change, the collection has extremely high write rates and it's large.
  • Let's say you want to link to the article in your management interface (not the public site), which link would you use? Normally the id, but now the id and the slug are equivalent. Now a simple bug (such as allowing an empty slug) would be hard to recover from, because the user couldn't even go to the management interface anymore.
  • You'll be dealing with charset issues. I'd suggest to not even use the slug for looking up the article, but the slug's hash.

Essentially, you'd end up with a schema like

{ "_id" : ObjectId("a237b45..."), // PK
  "slug" : "mongodb-is-fun", // not indexed
  "hash" : "5af87c62da34" } // indexed, unique
查看更多
登录 后发表回答