PHP URL Shortening Algorithm

2019-02-03 13:21发布

问题:

Could anyone recommend a preferred algorithm to use for URL shortening? I'm coding using PHP. Initially I thought about writing something that would start at a character such as "a" and iterate through requests, creating records in a database and therefore having to increment the character to b, c, d ... A, B and so on as appropriate.

However it dawned on me that this algorithm could be pretty heavy/clumsy and there could be a better way to do it.

I read around a bit on Google and some people seem to be doing it with base conversion from the database's ID column. This isn't something I'm too familiar with.

Could someone elaborate and explain to me how this would work? A couple of code examples would be great, too.

I obviously don't want a complete solution as I would like to learn by doing it myself, but just an explanation/pseudo-code on how this would work would be excellent.

回答1:

Most shortening services just use a counter that is incremented with every entry and convert the base from 10 to 64.

An implementation in PHP could look like this:

function encode($number) {
    return strtr(rtrim(base64_encode(pack('i', $number)), '='), '+/', '-_');
}
function decode($base64) {
    $number = unpack('i', base64_decode(str_pad(strtr($base64, '-_', '+/'), strlen($base64) % 4, '=')));
    return $number[1];
}

$number = mt_rand(0, PHP_INT_MAX);
var_dump(decode(encode($number)) === $number);

The encode function takes an integer number, converts it into bytes (pack), encodes it with the Base-64 encoding (base64_encode), trims the trailing padding = (rtrim), and replaces the characters + and / by - and _ respectively (strtr). The decode function is the inverse function to encode and does the exact opposite (except adding trailing padding).

The additional use of strtr is to translate the original Base-64 alphabet to the URL and filename safe alphabet as + and / need to be encoded with the Percentage-encoding.



回答2:

You can use base_convert function to do a base convertion from 10 to 36 with the database IDs.

<?php
   $id = 315;
   echo base_convert($id, 10, 36), "\n";
?>

Or you can reuse some of the ideas presented in the comments on the page bellow:

http://php.net/manual/en/function.base-convert.php



回答3:

i used to break ID by algorithm similar with how to convert from decimal to hex, but it will use 62 character instead of 16 character that hex would use.

'0','1','2','3','4','5','6','7','8','9',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'

example : if you will change ID = 1234567890 you will get kv7yl1 as your a key.



回答4:

Assuming your PRIMARY KEY is an INT and it auto_increments, the following code will get you going =).

<?php

    $inSQL = "INSERT INTO short_urls() VALUES();";
    $inResult = mysql_query($inSQL);
    $databaseID = base_convert(mysql_insert_id(), 10, 36);

    // $databaseID is now your short URL

?>

EDIT: Included the base_convert from HGF's answer. I forgot to base_convert in the original post.



回答5:

I adopted a "light" solution. On user request I generate a unique identifier (checking for conflicts in db) with this python snipplet:

url_hash = base64.b64encode(os.urandom(int(math.ceil(0.75*7))))[:6]

and store it in db.



回答6:

The native PHP base_convert() works well for small ranges of numbers, but if you really need to encode large values, consider using something like the implementation provided here which will work to base 64 and beyond if you simply provide more legal characters for the encoding.

http://af-design.com/blog/2010/08/10/working-with-big-integers-in-php/



回答7:

Here try this method :

hash_hmac('joaat', "http://www.example.com/long/url/", "secretkey");

It will provide you with hash value fit for a professional url shortener, e.g: '142ecd53'