Generating unique ID's ( check or not )? [clos

2019-07-10 07:43发布

Considering youtube video url (for example):

e.g. :

http://www.youtube.com/watch?v=-JVkaMqD5mI&feature=related

I'm talking about the -JVkaMqD5mI part. ( length=11)

lets calc the options :

a-z = 26     |
A-Z = 26     |_______ >    26+26+10+2 = 64 optional chars in 11 places  = 64^11 = 73786976294838206464
0-9 = 10     |
-_ = 2       |

Im still wondering , when they generate a new ID for a new video , do they still check if already exists ?

Im sure they have some list( db or cache) of the "already generated ID's" ... ( and if they do , do they aquire the db each time ? or in cache ? or...?)

Or do they rely on the 1.355252...e-20 chances which is almost 0.( but still !=0)

What is the best practice solutions for this kind of situations ?

2条回答
倾城 Initia
2楼-- · 2019-07-10 08:17

For those purposes, usually something called a hash function is used. It creates a fixed-length data or strings from some other data, which can be any given length or type. It uses some algorithm for that. One example is the one you gave, encoding letters to numbers.

Hash functions are not that simple as they look. There can be a serious mathematical method behind them, and you can try to prove that they are perfect or minimal perfect (which is not that important for this example).

A perfect function is a hash function that cannot generate the same output for any two different imputs. If you have that kind of hash function, you do not have to check for duplicates. If you want to do that, you have to prove that your hash function is perfect.

查看更多
Melony?
3楼-- · 2019-07-10 08:32

Well, just because they are using an alphanumeric ID on the video, does not mean they are simply generating those characters at random. Just because that string looks like random garbage to you I assure you it is not random and there is lots of information hidden in there.

So quick answer: No, it is not feasable to generate a random sequence of letters then either a) hope for no collisions or b) check through possibly billions of records to see if you already have that.

Much easier to keep a central "last ID used" and have an algorithm that moves from "last ID used" to "next ID to use" in a way that is mathematically guaranteed to generate a previously unused ID. In the case of sequential ID numbers that formula is simply f(n+1) = f(n)+1 (eg. last ID used was 150, next one will be 151.. guaranteed unused so far) but you can devise your own formulas to suit your needs.

查看更多
登录 后发表回答