While reading through the Hibernate documentation, I keep seeing references to the concept of a natural identifier.
Does this just mean the id an entity has due to the nature of the data it holds?
E.g. A user's name + password + age + something are used as a compound identitifier?
What naturally identifies an entity. For example, my email address.
However, a long variable length string is not an ideal key, so you may want to define a surrogate id
AKA Natural key in relational design
A natural identifier is something that is used in the real world as an identifier. An example is a social security number, or a passport number.
It is usually a bad idea to use natural identifiers as keys in a persistence layer because a) they can be changed outside of your control, and b) they can end up not being unique due to a mistake elsewhere, and then your data model can't handle it so your application blows up.
In Hibernate, natrual keys are often used for lookups. You will have an auto-generated surrogate id in most cases. But this id is rather useless for lookups, as you'll always query by fields like name, social security number or anything else from the real world.
When using Hibernate's caching features, this difference is very important: If the cache is indexed by your primary key (surrogate id), there won't be any performance gain on lookups. That's why you can define a set of fields that you are going to query the database with - the natural id. Hibernate can then index the data by your natural key and improve the lookup performance.
See this excellent blog post for a more detailed explanation or this RedHat page for an example Hibernate mapping file.
A social security number might be a natural identity, or as you've said a hash of the User's information. The alternative is a surrogate key, for example a Guid/UID.
Natural identifier (also known as business key): is an identifier that means or represent something in real life.
Email or national id for person
Isbn for Book
IBAN for Bank account
This
@NaturalId
Annotation is used to specify Natural identifier.In a relational database system, typically, you can have two types of simple identifiers:
The reason why Surrogate Keys are so popular is that they are more compact (4 bytes or 8 bytes), compared to a Natural Key which is very long (e.g. the VIN takes 17 alphanumerical characters, the book ISBN is 13 digits long).
Now, if the Surrogate Key becomes the Primary Key, you mao it using the JPA
@Id
annotation.And, if you have an entity that has also a Natural Key, besides the Surrogate one, you can map it with the Hibernate-specific
@NaturalId
annotation:Now, considering the entity above, the user might have bookmarked a
Post
article and now they want to read it. However, the bookmarked URL contains theslug
Natural Identifier, not the Primary Key.So, we can fetch it like this using Hibernate:
And Hibernate will execute the following two queries:
The first query is needed to resolve the entity identifier associated with the provided natural identifier.
The second query is optional if the entity is already loaded in the first or the second-level cache.
As I explained in this article, the reason for having the first query is because Hibernate already has a well-established logic for loading and associating entities by their identifier in the Persistence Context.
Now, if you want to skip the entity identifier query, you can easily annotate the entity using the
@NaturalIdCache
annotation:This way, you can fetch the
Post
entity without even hitting the database. Cool, right?