natural key vs surrogate key an innodb foreign key

2020-02-09 09:28发布

问题:

A question:

I have 2 tables:

Product
id INT
name VARCHAR(64)
something TEXT
else INT
entirely BOOL

and

Ingredient
id INT
name VARCHAR(64)
description TEXT

Now I also have a link table

Products_Ingredients
product_id INT
ingredient_id INT

for my many to many relation.

Now both products and ingredients will have unique names. So I can use names as natural keys... however will that be a good idea?

Say I have a product: Paint Thinner Supreme with ingredient: Butylonitrotetrocycline

Will that be a good idea to use those names as composite key in the link table?

As much as I understand idea behind using natural keys over the surrogates, I kinda can't stop thinking that using simple integers as primary keys (and foreign ones) will be much faster. Will there be a difference in a way in which MySQL server digests those different keys?

What is your opinion?

回答1:

Opinions don't matter when you can measure.

I implemented this on PostgreSQL using both natural keys and surrogates. I used 300,000 total products, 180 ingredients, and populated two "product ingredient" tables with 3 to 17 ingredients per product, for 100,000 randomly selected products (1053462 rows).

Selecting all the ingredients for a single product using natural keys returned in 0.067 ms. Using surrogates, 0.199ms.

Returning all the non-id columns for a single product using natural keys returned in 0.145 ms. Using surrogates, 0.222 ms

So natural keys were about 2 to 3 times faster on this data set.

Natural keys don't require any joins to return this data. Surrogate keys require two joins.

The actual performance difference depends on the width of your tables, number of rows, page size, and length of names, and things like that. There will be a point where surrogate keys start outperforming natural keys, but few people try to measure that.

When I was designing the database for my employer's operational database, I built a testbed with tables designed around natural keys and with tables designed around id numbers. Both those schemas have more than 13 million rows of computer-generated sample data. In a few cases, queries on the id number schema outperformed the natural key schema by 50%. (So a complex query that took 20 seconds with id numbers took 30 seconds with natural keys.) But 80% of the test queries had faster SELECT performance against the natural key schema. And sometimes it was staggeringly faster--a difference of 30 to 1.

We expect natural keys to outperform surrogates in our database for years to come. (Unless we move certain tables over to an SSD, in which case natural keys will probably outperform surrogates forever.)



回答2:

For this case I'd prefere surrogate keys because

  1. the name of a product or an ingredient may change, especially if your content is user generated (e.g. typos or there are several possible names for an item)
  2. your natural keys will be much longer than and therefore be less efficient