Shortest possible encoded string with decode possi

2020-02-23 02:20发布

I'm looking for a method that encodes an string to shortest possible length and lets it be decodable (pure PHP, no SQL). I have working script but I'm unsatisfied with length of the encoded string.

SCENARIO:

Link to an image (depends on the file resolution I want to show to the user):

  • www.mysite.com/share/index.php?img=/dir/dir/hi-res-img.jpg&w=700&h=500

Encoded link (so the user can't guess how to get the larger image):

  • www.mysite.com/share/encodedQUERYstring

So, basicaly I'd like to encode only the search query part of the url:

  • img=/dir/dir/hi-res-img.jpg&w=700&h=500

The method I use right now will encode the above query string to:

  • y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA

The method I use is:

 $raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';

 $encoded_query_string = base64_encode(gzdeflate($raw_query_string));
 $decoded_query_string = gzinflate(base64_decode($encoded_query_string)); 

How do I shorten the encoded result and still have the possibility to decode it using only PHP?

13条回答
我只想做你的唯一
2楼-- · 2020-02-23 02:48

I don't think the resulting url can be shortened much more than on your own example. But I suggest a few steps to obfuscate your images better.

First I would remove everything you can from the base url you are zipping and base64encoding, so instead of

img=/dir/dir/hi-res-img.jpg&w=700&h=500

I would use

s=hi-res-img.jpg,700,500,062c02153d653119

Were those last 16 chars are a hash to validate the url being opened is the same you offered in your code - and the user is not trying to trick the high res image out of the system.

Your index.php that serves the images would start like this:

function myHash($sRaw) { // returns 16 chars dual hash
    return hash('adler32', $sRaw) . strrev(hash('crc32', $sRaw));
} // These 2 hash algos are suggestions, there are more for you to chose.

// s=hi-res-img.jpg,700,500,062c02153d653119
$aParams = explode(',', $_GET['s']);
if (count($aParams) != 4) {
    die('Invalid call.');
}

list($sFileName, $iWidth, $iHeight, $sHash) = $aParams;

$sRaw = session_id() . $sFileName . $iWidth . $iHeight;
if ($sHash != myHash($sRaw)) {
    die('Invalid hash.');
}

After this point you can send the image as the user opening it had access to a valid link.

Note the use of session_id as part of the raw string that makes the hash is optional, but would make it impossible for users to share a valid url - as it would be session bind. If you want the urls to be shareable, then just remove session_id from that call.

I would wrap the resulting url the same way you already do, zip + base64. The result would be even bigger than your version, but more difficult to see thru the obfuscation, and therefore protecting your images from unauthorised downloads.

If you want only to make it shorter, I do not see a way of doing it without renaming the files (or their folders), or without the use of a database.

The file database solution proposed will surely create problems of concurrency - unless you always have no or very few people using the system simultaneously.

查看更多
【Aperson】
3楼-- · 2020-02-23 02:49

Short words about "security"

You simply won't be able to secure your link if there is no "secret password" stored somewhere: as long as the URI carries all information to access your resource, then it will be decodable and your "custom security" (they are opposite words btw) will be broken easily.

You can still put a salt in your PHP code (like $mysalt="....long random string...") since I doubt you want an eternal security (such approach is weak because you cannot renew the $mysalt value, but in your case, few years security sounds sufficient, since anyway, a user can buy one picture and share it elsewhere, breaking any of your security mechanism).

If you want to have a safe mechanism, use a well-known one (as a framework would carry), along with authentication and user rights management mechanism (so you can know who's looking for your image, and whether they are allowed to).

Security has a cost, if you don't want to afford its computing & storing requirements, then forget about it.


Secure by signing the URL

If you want to avoid users easy by-passing and get full res picture, then you may just sign the URI (but really, for safety, use something that already exist instead of that quick draft example below):

$salt = '....long random stirng...';
$params = array('img' => '...', 'h' => '...', 'w' => '...');
$p = http_build_query($params);
$check = password_hash($p, PASSWORD_BCRYPT, array('salt' => $salt, 'cost' => 1000);
$uri = http_build_query(array_merge($params, 'sig' => $check));

Decoding:

$sig = $_GET['sig'];
$params = $_GET;
unset($params['sig']);

// Same as previous
$salt = '....long random stirng...';
$p = http_build_query($params);
$check = password_hash($p, PASSWORD_BCRYPT, array('salt' => $salt, 'cost' => 1000);
if ($sig !== $check) throw new DomainException('Invalid signature');

See http://php.net/manual/fr/function.password-hash.php


Shorten smartly

"Shortening" with a generic compression algorithm is useless here because the headers will be longer than the URI, so it will almost never shorten it.

If you want to shorten it, be smart: don't give the relative path (/dir/dir) if it's always the same (or give it only if it's not the main one). Don't give the extension if it's always the same (or give it when it's not png if almost everything is in png). Don't give the height because the image carries the aspect ratio: you only need the width. Give it in x100px if you do not need a pixel-accurate width.

查看更多
Explosion°爆炸
4楼-- · 2020-02-23 02:52

Theory

In theory we need a short input character set and a large output character set. I will demonstrate it by the following example. We have the number 2468 as integer with 10 characters (0-9) as character set. We can convert it to the same number with base 2 (binary number system). Then we have a shorter character set (0 and 1) and the result is longer: 100110100100

But if we convert to hexadecimal number (base 16) with a character set of 16 (0-9 and A-F). Then we get a shorter result: 9A4

Practice

So in your case we have the following character set for the input:

$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";

In total 41 characters: Numbers, lower cases and the special chars = / - . &

The character set for output is a bit tricky. We want use URL save characters only. I've grabbed them from here: Characters allowed in GET parameter

So our output character set is (73 characters):

$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";

Numbers, lower AND upper cases and some special chars.

We have more characters in our set for the output than for the intput. Theory says we can short our input string. CHECK!

Coding

Now we need an encode function from base 41 to base 73. For that case I don't know a PHP function. Luckily we can grab the function 'convBase' from here: http://php.net/manual/de/function.base-convert.php#106546 (if someone knows a smarter function let me know)

<?php
function convBase($numberInput, $fromBaseInput, $toBaseInput)
{
    if ($fromBaseInput==$toBaseInput) return $numberInput;
    $fromBase = str_split($fromBaseInput,1);
    $toBase = str_split($toBaseInput,1);
    $number = str_split($numberInput,1);
    $fromLen=strlen($fromBaseInput);
    $toLen=strlen($toBaseInput);
    $numberLen=strlen($numberInput);
    $retval='';
    if ($toBaseInput == '0123456789')
    {
        $retval=0;
        for ($i = 1;$i <= $numberLen; $i++)
            $retval = bcadd($retval, bcmul(array_search($number[$i-1], $fromBase),bcpow($fromLen,$numberLen-$i)));
        return $retval;
    }
    if ($fromBaseInput != '0123456789')
        $base10=convBase($numberInput, $fromBaseInput, '0123456789');
    else
        $base10 = $numberInput;
    if ($base10<strlen($toBaseInput))
        return $toBase[$base10];
    while($base10 != '0')
    {
        $retval = $toBase[bcmod($base10,$toLen)].$retval;
        $base10 = bcdiv($base10,$toLen,0);
    }
    return $retval;
}

Now we can short the url. The final code is:

$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
$encoded = convBase($input, $inputCharacterSet, $outputCharacterSet);
var_dump($encoded); // string(34) "BhnuhSTc7LGZv.h((Y.tG_IXIh8AR.$!t*"
$decoded = convBase($encoded, $outputCharacterSet, $inputCharacterSet);
var_dump($decoded); // string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"

The encoded string has only 34 characters.

Optimizations

You can optimize the count of characters by

  • reduce the length of input string. Do you really need the overhead of url parameter syntax? Maybe you can format your string as follows:

    $input = '/dir/dir/hi-res-img.jpg,700,500';

    This reduces the input itself AND the input character set. Your reduced input character set is then:

    $inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz/-.,";

    Final output:

    string(27) "E$AO.Y_JVIWMQ9BB_Xb3!Th*-Ut"

    string(31) "/dir/dir/hi-res-img.jpg,700,500"

  • reducing the input character set ;-). Maybe you can exclude some more characters? You can encode the numbers to characters first. Then your input character set can be reduced by 10!

  • increase your output character set. So the given set by me is googled within 2 minutes. Maybe you can use more url save characters. No idea... Maybe someone has a list

Security

Heads up: There is no cryptographically logic in the code. So if somebody guesses the character sets, he can decode the string easily. But you can shuffle the character sets (once). Then it is a bit harder for the attacker, but not really safe. Maybe its enough for your use case anyway.

查看更多
Fickle 薄情
5楼-- · 2020-02-23 02:55

EDIT

Reading from the above and below comments, you need a solution to hide the real path of your image parser, giving it a fixed image width.

Step 1 : http://www.example.com/tn/full/animals/images/lion.jpg

You can achieve a basic "thumbnailer" by taking profit of .htaccess

 RewriteEngine on
 RewriteBase /
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteRule tn/(full|small)/(.*) index.php?size=$1&img=$2 [QSA,L]

Your PHP file:

 $basedir="/public/content/";
 $filename=realpath($basedir.$_GET["img"]);

 ## check that file is in $basedir
 if ((!strncmp($filename, $basedir, strlen($basedir)) 
    ||(!file_exists($filename)) die("Bad file path");

 switch ($_GET["size"]) {
    case "full":
        $width=700;
        $height=500;
        ## you can also use getimagesize() to test if the image is landscape or portrait
    break;
    default:
        $width=350;
        $height=250;
    break;
 }
 ## here is your old code for resizing images
 ## Note that the "tn" directory can exist and store the actual reduced images

This lets you using the url www.example.com/tn/full/animals/images/lion.jpg to view your reduced in size image.

This has the advantage for SEO to preserve the original file name.

Step 2 : http://www.example.com/tn/full/lion.jpg

If you want a shorter url, if the number of images you have is not too much, you can use the basename of the file (eg. "lion.jpg") and recursively search. When collision use an index to identify which one you want (eg. "1--lion.jpg")

function matching_files($filename, $base) {
    $directory_iterator = new RecursiveDirectoryIterator($base);
    $iterator       = new RecursiveIteratorIterator($directory_iterator);
    $regex_iterator = new RegexIterator($iterator, "#$filename\$#");
    $regex_iterator->setFlags(RegexIterator::USE_KEY);
    return array_map(create_function('$a', 'return $a->getpathName();'), iterator_to_array($regex_iterator, false));
}

function encode_name($filename) {
    $files=matching_files(basename($filename), realpath('public/content'));
    $tot=count($files);
    if (!$tot) return NULL;
    if ($tot==1) return $filename;
    return "/tn/full/".array_search(realpath($filename), $files)."--".basename($filename);
}

function decode_name($filename) {
    $i=0;
    if (preg_match("#^([0-9]+)--(.*)#", $filename, $out)) {
            $i=$out[1];
            $filename=$out[2];
    }

    $files=matching_files($filename, realpath('public/content'));

    return $files ? $files[$i] : NULL;
}

echo $name=encode_name("gallery/animals/images/lion.jp‌​g").PHP_EOL;
 ## --> returns lion.jpg
 ## You can use with the above solution the url http://www.example.com/tn/lion.jpg

 echo decode_name(basename($name)).PHP_EOL;
 ## -> returns the full path opn disk to the image "lion.jpg"

Original post:

Basically, if you add some formatting in your example your shorten url is in fact longer:

img=/dir/dir/hi-res-img.jpg&w=700&h=500  // 39 chars
y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA // 50 chars

Using base64_encode will always result in longer strings. And gzcompress will require at less to store one occurence of the different chars; this is not a good solution for small strings.

So doing nothing (or a simple str_rot13) is clearly the first option to consider if you want to shorten the result you had previously.

You can also use a simple character replacement method of your choice:

 $raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
 $from="0123456789abcdefghijklmnopqrstuvwxyz&=/ABCDEFGHIJKLMNOPQRSTUVWXYZ";
 // the following line if the result of str_shuffle($from)
 $to="0IQFwAKU1JT8BM5npNEdi/DvZmXuflPVYChyrL4R7xc&SoG3Hq6ks=e9jW2abtOzg";
 echo strtr($raw_query_string, $from, $to)."\n";

 // Result: EDpL4MEu4MEu4NE-u5f-EDp.dmprYLU00rNLA00 // 39 chars

Reading from your comment, what you really want is "to prevent anyone to gets a hi-res image".

The best way to achieve that is to generate a checksum with a private key.

Encode:

$secret="ujoo4Dae";
$raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$encoded_query_string = $raw_query_string."&k=".hash("crc32", $raw_query_string.$secret);

Result: img=/dir/dir/hi-res-img.jpg&w=700&h=500&k=2ae31804

Decode:

if (preg_match("#(.*)&k=([^=]*)$#", $encoded_query_string, $out)
    && (hash("crc32", $out[1].$secret) == $out[2])) {
    $decoded_query_string=$out[1];
}

This does not hide the original path but this path has no reason to be public, your "index.php" can output your image from the local directory once the key has been checked.

If you really want to shorten your original URL, you have to consider the acceptable characters in the original url to be restricted. Many compression methods are based on the fact that you can use a full byte to store more than a character.

查看更多
Viruses.
6楼-- · 2020-02-23 02:55

I'm afraid, you won't be able to shorten the query string better than any known compression algorithm. As already mentioned, a compressed version will be shorter by a few (around 4-6) characters than the original. Moreover, the original string can be decoded relatively easy (opposed to decoding sha1 or md5, for instance).

I suggest shortening URLs by means of Web server configuration. You might shorten it further by replacing image path with an ID (store ID-filename pairs in a database).

For example, the following Nginx configuration accepts URLs like /t/123456/700/500/4fc286f1a6a9ac4862bdd39a94a80858, where

  • the first number (123456) is supposed to be an image ID from database;
  • 700 and 500 are image dimentions;
  • the last part is an MD5 hash protecting from requests with different dimentions.
# Adjust maximum image size
# image_filter_buffer 5M;

server {
  listen          127.0.0.13:80;
  server_name     img-thumb.local;

  access_log /var/www/img-thumb/logs/access.log;
  error_log /var/www/img-thumb/logs/error.log info;

  set $root "/var/www/img-thumb/public";

  # /t/image_id/width/height/md5
  location ~* "(*UTF8)^/t/(\d+)/(\d+)/(\d+)/([a-zA-Z0-9]{32})$" {
    include        fastcgi_params;
    fastcgi_pass   unix:/tmp/php-fpm-img-thumb.sock;
    fastcgi_param  QUERY_STRING image_id=$1&w=$2&h=$3&hash=$4;
    fastcgi_param  SCRIPT_FILENAME /var/www/img-thumb/public/t/resize.php;

    image_filter resize $2 $3;
    error_page 415 = /empty;

    break;
  }

  location = /empty {
    empty_gif;
  }

  location / { return 404; }
}

The server accepts only URLs of specified pattern, forwards request to /public/t/resize.php script with modified query string, then resizes the image generated by PHP with image_filter module. In case of error, returns an empty GIF image.

The image_filter is optional, it is included only as an example. Resizing can be performed fully on PHP side. With Nginx, it is possible to get rid of PHP part, by the way.

The PHP script is supposed to validate the hash as follows:

// Store this in some configuration file.
$salt = '^sYsdfc_sd&9wa.';

$w = $_GET['w'];
$h = $_GET['h'];

$true_hash = md5($w . $h . $salt . $image_id);
if ($true_hash != $_GET['hash']) {
  die('invalid hash');
}

$filename = fetch_image_from_database((int)$_GET['image_id']);
$img = imagecreatefrompng($filename);
header('Content-Type: image/png');
imagepng($img);
imagedestroy($img);
查看更多
Melony?
7楼-- · 2020-02-23 02:56

I suspect that you will need to think more about your method of hashing if you don't want it to be decodable by the user. The issue with base64 is that a base64 string looks like a base64 string. There's a good chance that someone that's savvy enough to be looking at your page source will probably recognise it too.

Part one:

a method that encodes an string to shortest possible length

If you're flexible on your URL vocab/characters, this will be a good starting place. Since gzip makes a lot of its gains using back references, there is little point as the string is so short.

Consider your example - you've only saved 2 bytes in the compression, which are lost again in base64 padding:

Non-gzipped: string(52) "aW1nPS9kaXIvZGlyL2hpLXJlcy1pbWcuanBnJnc9NzAwJmg9NTAw"

Gzipped: string(52) "y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA=="

If you reduce your vocab size, this will naturally allow you better compression. Let's say we remove some redundant information

Take a look at the functions:

function compress($input, $ascii_offset = 38){
    $input = strtoupper($input);
    $output = '';
    //We can try for a 4:3 (8:6) compression (roughly), 24 bits for 4 chars
    foreach(str_split($input, 4) as $chunk) {
        $chunk = str_pad($chunk, 4, '=');

        $int_24 = 0;
        for($i=0; $i<4; $i++){
            //Shift the output to the left 6 bits
            $int_24 <<= 6;

            //Add the next 6 bits
            //Discard the leading ascii chars, i.e make
            $int_24 |= (ord($chunk[$i]) - $ascii_offset) & 0b111111;
        }

        //Here we take the 4 sets of 6 apart in 3 sets of 8
        for($i=0; $i<3; $i++) {
            $output = pack('C', $int_24) . $output;
            $int_24 >>= 8;
        }
    }

    return $output;
}

And

function decompress($input, $ascii_offset = 38) {

    $output = '';
    foreach(str_split($input, 3) as $chunk) {

        //Reassemble the 24 bit ints from 3 bytes
        $int_24 = 0;
        foreach(unpack('C*', $chunk) as $char) {
            $int_24 <<= 8;
            $int_24 |= $char & 0b11111111;
        }

        //Expand the 24 bits to 4 sets of 6, and take their character values
        for($i = 0; $i < 4; $i++) {
            $output = chr($ascii_offset + ($int_24 & 0b111111)) . $output;
            $int_24 >>= 6;
        }
    }

    //Make lowercase again and trim off the padding.
    return strtolower(rtrim($output, '='));
}

What's going on there is basically a removal of redundant information, followed by the compression of 4 bytes into 3. This is achieved by effectively having a 6-bit subset of the ascii table. This window is moved so that the offset starts at useful characters and includes all the characters you're currently using.

With the offset I've used, you can use anything from ASCII 38 to 102. This gives you a resulting string of 30 bytes, that's a 9-byte (24%) compression! Unfortunately, you'll need to make it URL-safe (probably with base64), which brings it back up to 40 bytes.

I think at this point, you're pretty safe to assume that you've reached the "security through obscurity" level required to stop 99.9% of people. Let's continue though, to the second part of your question

so the user can't guess how to get the larger image

It's arguable that this is already solved with the above, but what you need to do is pass this through a secret on the server, preferably with php openssl. The following code shows the complete usage flow of functions above and the encryption:

$method = 'AES-256-CBC';
$secret = base64_decode('tvFD4Vl6Pu2CmqdKYOhIkEQ8ZO4XA4D8CLowBpLSCvA=');
$iv = base64_decode('AVoIW0Zs2YY2zFm5fazLfg==');

$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
var_dump($input);

$compressed = compress($input);
var_dump($compressed);

$encrypted = openssl_encrypt($compressed, $method, $secret, false, $iv);
var_dump($encrypted);

$decrypted = openssl_decrypt($encrypted, $method, $secret, false, $iv);
var_dump($decrypted);

$decompressed = decompress($compressed);
var_dump($decompressed);

The output of this script is the following:

string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(44) "xozYGselci9i70cTdmpvWkrYvGN9AmA7djc5eOcFoAM="
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"

You'll see the whole cycle: compression > encryption > base64 encode/decode > decryption > decompression. The output of this would be as close as possible as you could really get, at near the shortest length you could get.

Everything aside, I feel obliged to conclude this with the fact that it is theoretical only, and this was a nice challenge to think about. There are definitely better ways to achieve your desired result - I'll be the first to admit that my solution is a little bit absurd!

查看更多
登录 后发表回答