I have been using AES-CBC for encryption and I use a random IV each time I encrypt plain text. As far as I can tell, this is the recommended approach.
I have been looking into AES-GCM / AES-CTR, primarily for the AEAD. I have not yet implemented anything with this but from everything I have read, basically the nonce is just a shorted IV and there is an internal counter that is used for each encryption call. The developer / needs to make sure the nonce changes before the 32 bit counter cycles back, otherwise the same nonce (IV) is potentially used with the same key which could encrypt the same plain text and leak the encryption key.
What I don't really understand is why can AES-CBC be fine with a random IV but some of what I have read indicates a random nonce (IV) for AES-GCM is a bad idea. The only thing I can think of is the that IV for AES-CBC is longer than the nonce for AES-GCM so the likely hood of duplicate nonce is greater for AES-GCM.
I need to encrypt data that is anywhere from a few bytes to 10 - 20 GB. I know AES-GCM has a limit to the size of data (~60GB) that it can encrypt before the counter cycles. I can get around this limitation since my data is below this limit.
Can someone shed some light on why a random nonce is not suggested for AES-GCM?
GCM is based on CTR mode and inherits the many-time pad (or two-time pad) problem if a nonce is reused with the same key (very nice example). If the IV is reused in CBC mode, then the only thing that an observer can detect is the equality of message prefixes.
An observer can detect that a previously sent message is sent again with CBC mode, which might not give them much, but CTR provides them with the ability to deduce the contents of a message if the some information about the structure of the content is known.
A nonce for AES-GCM mode is expected to be 96 bit long. If you're generating nonces randomly, then you are expected to generate a duplicate nonce after 2n/2=248 messages (see Birthday problem). That is, the probability of generating a duplicate nonce is 50% if you generated 248 encrypted messages with the same key. That is quite a lot of messages, but it can happen earlier.
GCM is a variation on Counter Mode (CTR). As you say, with any variant of Counter Mode, it is essential that the Nonce is not repeated with the same key. Hence CTR mode Nonces often include either a counter or a timer element: something that is guaranteed not to repeat over the lifetime of the key.
If the Nonce is purely random then there is a small chance that it will repeat. That problem is easily avoidable, hence the advice not to use a random nonce.
In CBC mode the IV munges the contents of the first block. If the first block is not altered (or a fixed IV is used) then the encryption of the first block (only) is effectively in ECB mode, which is insecure. A random IV for CBC mode avoids this problem.
Hence the difference in treatments: CTR (and modes like GCM which are derived from it) need a guaranteed unique Nonce. Modes like CBC need a random IV.
Using a random IV / nonce for GCM has been specified as an official recommendation by - for instance - NIST. If anybody suggests differently then that's up to them.
The birthday problem greatly increases the chance of an IV collision when a random IV is used. With the default size of the nonce (12 bytes or 96 bits) the chance of a collision isn't high. It is still possible to encrypt over a billion files or messages before the mode becomes vulnerable.
If GCM becomes vulnerable then an adversary may find the (internal) key used to generate the authentication tag. Besides that, as GCM uses CTR mode internally, so the confidentiality of the data within the files / messages is likely lost. So if an IV is ever repeated GCM will fail spectacularly.
Careful protocol design and implementation of the random IV are required because of the above limitations and vulnerabilities. It is therefore advisable to use a well tested cryptographic random number generator to generate the values of the IV.
More information on the generation of the IV and the resulting limitations is available in sections 8.2 and 8.3 of the NIST specification SP 800-38D on GCM.
In the NIST specs it is required to use the full 96 bits for the random IV. So there aren't any spare bits to include other information within the IV, as using an IV other than the default size of 96 bits is not advisable.
NIST specifies a limit of of 2^32 (four billion) messages to be encrypted with the same key for the randomly generated IV.