Why do websites generate random alphanumeric strin

2019-03-12 21:08发布

问题:

Why does many sites (youtube is good example) generate string of random number and letter instead of using for example the row id?

usually its something likes this

bla?v=wli4l73Chc0

instead of like

bla?id=83934

Is it just to keep it short if you have many rows? Or is there other good things about this? Because i can imagine: bla?id=23934234234 dont look so nice

Thanks and cheers

回答1:

They are actually not random strings. Normally they are numbers (usually row IDs) that are encoded in Base-36 encoding (obviously not always the case, but there are many that use it).

Why do they use it? Because a Base-36 encoded number string is shorter than the original.

For example: 1234567890 in Base-36 is kf12oi, almost 50% shorter.

See this Wikipedia article. Check the "Uses in practice" section to see who is using it.



回答2:

in distributed environment it is simpler to generate random numbers for identifiers than sequential numbers.



回答3:

I honestly am not sure why they wouldn't use the unique ID (or ObjectID or whatever depending on what database) so have you ever wondered if rather than representing the ID in base-10, they represented it in a higher base (such as 64, or whatever is capable within URLs) so that the ID is more compact on the query string? (read: wli4l73Chc0 is some number in non-base-10)



回答4:

I upvoted Rob's answer, but I'll also elaborate a bit on one of the risks.

If you publish a link like Why do websites generate random alphanumeric strings for urls instead of using row ids? where 258510 is a database id someone trying to hack your site is going to try connecting to https://stackoverflow.com/questions/2581511.

With stackoverflow, this may not be a database id, and the questions on stackoverflow are not supposed to be private, so it's not a big deal even if it is.

But if this were a site where restricting data access to owners of the data were important, this potentially risks letting people see data they shouldn't.

There are of course things you can and should do to make it refuse to show the data if they don't own it, but it's still better to make the url not identify a database id. It's better, as Rob noted, to have a hash into some much larger domain, or an session-based index into a set of data already identified as appropriate to show the user and available only within a logged-in session.



回答5:

I would guess it's to obfuscate information and to add/increase the amount of information that can be passed via that parameter.



回答6:

Having raw row ids, or other unmodified database parameters in urls, is bad security practice. Far better to have hashes into some large domain.



回答7:

Some environments also use this to establish state variables for the session. For example, if you have an ASP.Net app that is using cookieless sessions, you'll find a similar code in the URL.