Base10 to base64 url shortening

2019-02-13 23:22发布

I'm coding an url shortener function for a project in which I'm learning php, here is the code (btw I suppose that global here is not a good thing to do :P):

$alphabet = array(1 => "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
                "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
                "0","1","2","3","4","5","6","7","8","9","_","-");

function shorten($id){
    global $alphabet;
    $shortenedId = "";
    while($id>0){
        $remainder = $id % 64;
        $id = $id / 64;     
        $shortenedId = $alphabet[$remainder].$shortenedId;
    }
    return $shortenedId;
}

The code is taken from this Wikipedia article and adapted to php. My problem is that when I pass a multiple of 64 to the function I get a wrong (for my purpose) result, for instance 128 returns b which is not correct, it should have been aaa, but that's too long for a 3-digit number.

Also I'm starting to think that there's something wrong in this code, if I pass 1'000'000'000'000 as $id I get nItOq... I feel it's wrong because a url shortening service like bit.ly returns a 6 number id if I use it, and I don't think that this algorithm is better than theirs.

So, two questions:

  • do you spot any bug in the above code?
  • how to manage 64-multiple ids? Do I have to just ignore them and pass to the next one?

7条回答
叼着烟拽天下
2楼-- · 2019-02-13 23:39

These two functions are very convenient, thanks to @malhal:

function shorten_int($id)
{
    $id=dechex($id);
    $id=strlen($id)%2===0?hex2bin($id):hex2bin('0'.$id);
    $id=base64_encode($id);
    $id=strtr($id, array('/'=>'_', '+'=>'-', '='=>''));
    return $id;
}

function unshorten_int($id)
{
    $id=strtr($id, array('-'=>'+', '_'=>'/'));
    $id=base64_decode($id);
    $id=bin2hex($id);
    return base_convert($id, 16, 10);
}

echo shorten_int(43121111)."\n";
echo unshorten_int(shorten_int(43121111))."\n";
查看更多
Melony?
3楼-- · 2019-02-13 23:39

You can use the pack.

$int = 1129717211140920362;

$byte = pack('J*', $int);    
echo base64_encode($byte); //= D62P0WqzFCo=

It will result in D62P0WqzFCo=, it is correct, because the $int is an int64 and uses 64 bits. The Base64 uses 6 bits for each character, so they need ~11 characters.

To decode use:

$base64 = 'D62P0WqzFCo=';

$byte = base64_decode($base64);
echo unpack('J*',  $byte)[1]; //= 1129717211140920362

It will return 1129717211140920362. ;)


It was based in the answer on the Stackoverflow in Portuguese.

查看更多
混吃等死
4楼-- · 2019-02-13 23:41

In case you're looking for the opposite function to take a base64 number and convert to base10, here's some PHP based off of the JavaScript in this answer: How to convert base64 to base10 in PHP?

function lengthen($id) {
    $alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';

    $number=0;
    foreach(str_split($id) as $letter) {
        $number=($number*64) + strpos($alphabet,$letter);
    }
    return $number;
}
查看更多
姐就是有狂的资本
5楼-- · 2019-02-13 23:41

Paul Greg created some PHP code that converts from Base-10 to another base. This can be tested and the code downloaded here:

http://www.pgregg.com/projects/php/base_conversion/base_conversion.php

I'm using this approach to convert the database row id's to Base-64. Once these numbers have been shortened they can be used in the URL. [details]

查看更多
Lonely孤独者°
6楼-- · 2019-02-13 23:43

Just a couple of little tweaks needed, the main two were to make the the alphabet zero indexed rather than one-indexed, and to subtract the remainder from the id before dividing

function shorten($id)
{
    $alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';
    $shortenedId = '';
    while($id>0) {
        $remainder = $id % 64;
        $id = ($id-$remainder) / 64;     
        $shortenedId = $alphabet{$remainder} . $shortenedId;
    };
    return $shortenedId;
}

and here's a further modified version which... well I just like

function shorten($id, $alphabet='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-')
{
    $base = strlen($alphabet);
    $short = '';
    while($id) {
        $id = ($id-($r=$id%$base))/$base;     
        $short = $alphabet{$r} . $short;
    };
    return $short;
}

EDIT: sorted concatenation to be the same as the OPs

查看更多
霸刀☆藐视天下
7楼-- · 2019-02-13 23:44

By the way, check out the base_convert() function (http://php.net/manual/en/function.base-convert.php):

echo base_convert(1000000000, 10, 36);

36 is the longest base it can convert to, though. But in the comments section I found this:

function dec2any( $num, $base, $index=false ) {
    if (! $base ) {
        $base = strlen( $index );
    } else if (! $index ) {
        $index = substr( "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ,0 ,$base );
    }
    $out = "";
    for ( $t = floor( log10( $num ) / log10( $base ) ); $t >= 0; $t-- ) {
        $a = floor( $num / pow( $base, $t ) );
        $out = $out . substr( $index, $a, 1 );
        $num = $num - ( $a * pow( $base, $t ) );
    }
    return $out;
}

echo dec2any(1000000000, 64, "_-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");

Maybe it will help?

查看更多
登录 后发表回答