I've been reading a lot of articles about DDD and noticed that most are using GUID as their ID when persisting to a database. They say that GUID scales well and auto incrementing ID's are a big no-no when it comes to scalability.
Im confused now whether to use GUID
or auto-increment
.
Basically the Domain is about membership system (binary tree). (tracking of register members)
The first requirement is that we should have something that uniquely identifies them in the system (we call it Account No.
) perhaps 7digit.
Then new Members
can be registered by another Member
. We call it referral.
Now what Im planning to do is to have MemberId
of GUID type as the DomainObject Id where it serves as the primary key which will be used for joins, foreign keys (on Referral, referer_id would be the GUID MemberId
). AccountNo
will be an auto-increment column or maybe it will be acquired from repository by MAX() + 1. Mainly it will be used for search features in the system and in links.
Should the ID of DomainObject remain hidden to users of the system since its just a technical implementation?
Is it ok to combine both? GUID as row_id in database (Surrogate Key). and Auto-Increment for (Natural Key)?
Is it okay to exclude the AccountNo
from the constructor because it will be auto-incremented anyway? What about the need to enforce invariants? So is getting the next ID from repository the way to go and include AccountNo
in the constructor?
Should I just stick with Auto-Increment ID and forget about GUID, remove MemberId
and let the AccountNo
be the ID of the DomainObject?
NOTE:
I am not building the next facebook of some kind.
I just want to practice the Tactical side of DDD to learn how to make hard architectural decisions knowing their PROS and CONS.
I just want to practice the Strategic side of DDD to learn how to make hard architectural decisions knowing their PROS and CONS and their implementation.
If we will make 3 scenarios with member registration:
- First Scenario: Member registration happens every minute.
- Second Scenario: Member registration happens every hour.
- Third Scenario: Member registration happens atmost 5 daily.
How will it affect the decisions?
Technology Stack:
- ASP MVC 5
- Sql Server 2014
- C#
- Dapper ORM
Allow me to defend the auto-increment idea. It is true that GUIDs are more scalable. But an important question any designer should ask at some point is "How much scalability do I need?"
The answer is very rarely "As much as possible!" In the real world, everything has limits. Our databases model the real world.
For example, if you are working with people (users, customers, students, etc.), a 64-bit integer can contain the entire population of the Earth many times over. Very many times. We're talking about "population of the galactic empire" here. Using bigint, you can just about uniquely identify every atom in the universe.
Don't get lazy, especially during the design phase. Design a reasonable margin of safety and go on. Anything more unnecessarily increases the complexity and "friction" of the system.
My experience in this field is now measured in decades -- several of them. In that time, I have never had to use GUIDs for scalability. The only actual use for GUIDs I have found is when entities are created in different, usually remote, databases which then must be merged together into a central database. GUIDs eliminate (statistically speaking, that is) the possibility of collisions during the merge.
There are probably too many questions in your question to give you a complete answer, because ID design is not simple and has many facets. I can recommend the book "Implementing DDD" by Vaughn Vernon, it has a section dedicated to identity design (Chapter 5 "Entities" - Unique Identity).
I try to point you into the right direction anyway, without reciting everything from that chapter :-) .
What do you need?
You already stated some questions regarding ID design, but there are more questions that you need to ask. Only then can you decide whether GUID, DB-generated or still different IDs are appropriate.
The answers to these questions will constrain the type of ID generation that you can use.
What you should be aware of
There are a few rules regarding ID design. Following them is strongly recommended, so that you don't shoot yourself in the foot later:
Example
Here is an example of how you could find an ID design:
Allowing the ID to be created late (i.e. by the persistence) means that you do have an entity without an ID when you just create the entity. So if you need early or just-in-time IDs, you cannot use DB-generated IDs (unless you accept contacting the DB just for ID retrieval).
Then you may decide that the user is not interested in the ID, so having to specify the ID would be strange. This leaves application generated IDs.
With application generated IDs, you need to make sure that the ID is unique. This is trivial for single-instance applications, but may be more problematic as soon as you create a load-balanced setup with more than one app instance. This is the reason why many people use random IDs (such as GUIDs) in the first place, so they don't hit a dead end when the scale out.
As you see, this example makes many assumptions. There just is no right or wrong, but with the questions stated above, you should be able to make an informed decision.
The main scalability problem with auto-increment integers is with the inserts: since new values "bunch up" together at the upper extreme of the value range, they tend to hit the same "side" of a B-Tree (and likely the same leaf node), causing latching and lowering the concurrency.
At just once-a-minute insertion, you simply won't see any of this, so pick your key based on other criteria. As far as you have described, auto-increment integers would serve just fine in your scenario... they are more lightweight and are likely to perform better than GUIDs. And if 32-bit variant is not wide enough, just use 64-bit.
BTW, auto-increment integers are not generated by querying for
MAX() + 1
- all DBMSes have their own version of high-performance "sequence generator" that you can use directly.You can also return the generated value directly to the client without requiring an additional round-trip (e.g Oracle's RETURNING or SQL Server's OUTPUT clause). Unfortunately, ORMs don't always play well with that...
You got that wrong. You can't learn strategy by doing tactics. Tactics are ways to implement a strategy. But you need a strategy first.
Anyway about your question, it's quite simple: use a Guid. It has 2 advantages
The natural id, like AccountNo, should be used too. However, the Guid is there for technical purposes. The natural keys format might change in the future, the Guid makes it easy to support natural key multiple formats.
As a practice, is best that your entity id to be a value object (even if it's just Guid). You can incorporate the guid in
AccountNo
too, a VO doesn't need to be only one value. For example, in my current app, I have aProjectId(Guid organization,Guid idValue)
andProjectAssetId(Guid organization,Guid projectId,Guid idValue)
.