Before you start marking this as a duplicate, read me out. The other question has a (most likely) incorrect accepted answer.
I do not know how .NET generates its GUIDs, probably only Microsoft does, but there's a high chance it simply calls CoCreateGuid(). That function however is documented to be calling UuidCreate(). And the algorithms for creating an UUID are pretty well documented.
Long story short, be as it may, it seems that System.Guid.NewGuid()
indeed uses version 4 UUID generation algorithm, because all the GUIDs it generates matches the criteria (see for yourself, I tried a couple million GUIDs, they all matched).
In other words, these GUIDs are almost random, except for a few known bits.
This then again raises the question - how random IS this random? As every good little programmer knows, a pseudo-random number algorithm is only as random as its seed (aka entropy). So what is the seed for UuidCreate()
? How ofter is the PRNG re-seeded? Is it cryptographically strong, or can I expect the same GUIDs to start pouring out if two computers accidentally call System.Guid.NewGuid()
at the same time? And can the state of the PRNG be guessed if sufficiently many sequentially generated GUIDs are gathered?
Added: To clarify, I'd like to find out how random can I trust it to be and thus - where can I use it. So, let's establish a rough "randomness" scale here:
- Basic randomness, taking current time as the seed. Usable for shuffling cards in Solitaire but little else as collisions are too easy to come by even without trying.
- More advanced randomness, using not only the time but other machine-specific factors for seed. Perhaps also seeded only once on system startup. This can be used for generating IDs in a DB because duplicates are unlikely. Still, it's not good for security because the results can be predicted with sufficient effort.
- Cryptograhpically random, using device noise or other advanced sources of randomness for seed. Re-seeded on every invocation or at least pretty often. Can be used for session IDs, handed out to untrusted parties, etc.
I arrived at this question while thinking if it would be OK to use them as DB IDs, and whether the Guid.comb algorithm implementation together with System.Guid.NewGuid()
(like NHibernate does it) would be flawed or not.
The definition of Random in no way relates to the definition of Globally Unique.
Flipping a coin twice and getting HH, HT, TH, TT are all random. HH is just as random as HT.
Flipping a "special" coin twice and guaranteeing that you will only get HT or TH is uniqueness.
According to https://msdn.microsoft.com/en-us/library/bb417a2c-7a58-404f-84dd-6b494ecf0d13#id11, since Windows 2000 back in 1999,
So I'd consider them cryptographically secure -- at least to the extent of the 122 bits of entropy they provide.
Also see https://stackoverflow.com/a/35384818/284704, where Will Dean verified through a debug-step that the CLR is calling the proper secure OS random generator.
Some people have already hinted at that but I want to repeat it since there appears to be a misconception there:
Randomness and uniqueness are orthogonal concepts.
Random data can be unique or redundant, and likewise unique data can use a random source or a deterministic source (think a global counter that is locked and incremented for every GUID ever created).
GUIDs were designed to be unique, not random. If the .NET generator appears to use random input, fine. But don’t rely on it as a source of randomness, neither for cryptographical nor for any other purposes (in particular, what distribution function do you expect to get?). On the other hand, you can be reasonably sure that GUIDs created by .NET, even in large volumes, will be unique.
I read somewhere that the chances of winning the lottery would be equivalent to 2 4-byte "GUIDs" colliding. The standard 16-byte GUIDs would offer much less chance of collision.
GUIDs are designed to be at number 2 on your scale, i.e. "can be used for generating IDs in a DB because duplicates are unlikely*."
As for security, the problem isn't "it's not good for security because the results can be predicted with sufficient effort.". The problem is that no-one gives you a documented security guarantee.
In practise, according to this comment and this one, the GUID generation is implemented in terms of a cryptographically secure RNG (
CryptGenRandom
). But that appears to be an undocumented implementation detail. (And I haven't verified this - it's random comments on the Internet, take with a truckload of salt).(* Where "unlikely" means something like "the chances of anyone finding a duplicate GUID before the end of the universe are less than the chances of you personally winning the lottery." Implementation bugs excepted, of course.)
APIs that produce random bytes but which are not explicitly documented to produce cryptographically strong random bytes cannot be trusted to produce cryptographically strong random bytes.
If you need cryptographically strong random bytes, then you should be using an API which is explicitly documented to produce them.
These GUIDs are simply 128 bits of cryptographic randomness. They are not structured, and they will not collide.
See this article for some of the math. Using "The General Birthday Formula", rearranging gives
where n is the number of chosen elements, T is the total number of elements (2^128), and p is the target probability that all n chosen elements will be different. With p = .99, this gives *n = 2.61532104 * 10^18*. This means that we can generate a billion truly random GUIDs per second within a system for a billion seconds (32 years), and have better than 99% chance at the end that each one is unique within the system.