I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef
".
Instead of "abcdef
" there can be any other string with six characters containing a-z, A-Z and 0-9
. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
- id, integer, auto-increment
- long, string, the long URL the user entered
- short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for "id
" and build a hash of it. This hash should then be inserted as "short
". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For "http://www.google.de/
" I get the auto-increment id 239472
. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)
If you don't want re-invent the wheel ... http://lilurl.sourceforge.net/
My Python 3 version
My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.
I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path.
I ran a script to see how long it takes until it exhausts the character length. For six characters it can do
164,916,224
links and then goes up to seven characters. Bitly uses seven characters. Under five characters looks weird to me.Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link
sho.rt/ka8ds3
as a primary key.Here is the full concept:
You could hash the entire URL, but if you just want to shorten the id, do as marcel suggested. I wrote this Python implementation:
https://gist.github.com/778542