How to design a sequential hash-like function

2020-02-08 06:29发布

I want to develop something similar to jsfiddle in where the user can input some data and then "save" it and get a unique random looking url that loads that data.

I don't want to make the saves sequential because I don't want anyone to grab all of my entries, as some can be private. However on the server I would like to save it in sequential order.

Is there a function or technique that converts a number into a hash that has 4 charactors without any collisions until (62 * 62 * 62 * 62 === 14776336) entries?

For example the first entry on the server will be named 1 on the server but iUew3 to the user, the next entry will be 2 on the server but ueGR to the user...

EDIT: I'm not sure if it's obvious but this hash-like function needs to be reversible because when the user requests ueGR the server needs to know to server it file 2

6条回答
啃猪蹄的小仙女
2楼-- · 2020-02-08 07:06

In my opinion if you also keeping the save time of entry on server, you can generate a hash function. hash = func(id, time) but with only hash = func(id) gonna be to easy to resolve

查看更多
Ridiculous、
3楼-- · 2020-02-08 07:09

Here's how I implemented it. Here's the save.php file (can someone tell me if there are any design flaws in it):

<?php

$index = file_get_contents('saves/data/placeholder');
$index++;
file_put_contents('saves/data/placeholder', $index);

$string = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
do {
    $hash = $string[rand(0, 61)] . $string[rand(0, 61)] . $string[rand(0, 61)] . $string[rand(0, 61)];
} while (file_exists('saves/' . $hash));

file_put_contents('saves/' . $hash, $index);
file_put_contents('saves/data/' . $index, $_REQUEST['data']);

echo $hash;

?>

And here's load.php:

<?php

if (!file_exists('saves/' . $_REQUEST['file'])) {
    file_put_contents('saves/data/log', 'requested saves/' . $_REQUEST['file'] . "\n", FILE_APPEND);
    die();
}
$file_pointer = file_get_contents('saves/' . $_REQUEST['file']);

if (!file_exists('saves/data/' . $file_pointer)) {
    file_put_contents('saves/data/log', 'requested saves/data/' . $file_pointer . 'from ' . $_REQUEST['file'] . "\n", FILE_APPEND);
    die();
}
echo file_get_contents('saves/data/' . $file_pointer);

?>

Hope this helps others.

查看更多
ゆ 、 Hurt°
4楼-- · 2020-02-08 07:09

Here's a reversible lib that works w/ bcmath
http://blog.kevburnsjr.com/php-unique-hash

查看更多
何必那么认真
5楼-- · 2020-02-08 07:11

It's an odd set of constraints. I routinely use MD5 checksums to generate unique URLs from data. If the user doesn't already have the data, they can't guess the URLs.

I do understand about not wanting to use a database—if you've never used one before, the learning curve can be a little steep.

I don't understand the constraint about "storing things sequentially on the server." If you need to know the order in which the hashes are created, I'd simply put that information in a separate file. You might have to do file locking or some other kind of hack to make sure you can append a hash to that file incrementally.

If you want short URLs, you can either take a prefix of an MD5 checksum or you can take a CRC-32 and base64 encode it. Both will give you unique URLs with reasonably good probability.

查看更多
欢心
6楼-- · 2020-02-08 07:13

This can't really be reversible. The only way (the one used by url shorteners and jsfiddle) is to store the generated hash (actually it's a digest) in a table/data structure of some sort and *look it up on retrieval.

Why this?

Passing from, e.g. 128 chars of data → a 4 visible char digest, you lose a lot of data.
You cannot store the remaining data in the magical cracks betweeen those 4 bytes, there are none.

查看更多
欢心
7楼-- · 2020-02-08 07:18

It's possible to do this, but I would suggest using 64 characters, as that will make it a lot easier. 4 6bit characters = 24bits.

Use a combination of these:

  • bit reordering
  • xor with a number
  • put it into a 24bit maximal length LFSR and do a couple of cycles.

LFSR is highly recommended as it will do a good scrambling. The rest are optional. All of these manipulations are reversible and guarantee that each output is going to be unique.

When you calculated the "shuffled" number simply pack it to a binary string and encode it with base64_encode.

For decoding simply do the inverse of these operations.

Sample (2^24 long unique sequence):

function lfsr($x) {
    return ($x >> 1) ^ (($x&1) ? 0xe10000 : 0);
}
function to_4($x) {
    for($i=0;$i<24;$i++)
        $x = lfsr($x);
    $str = pack("CCC", $x >> 16, ($x >> 8) & 0xff, $x & 0xff);
    return base64_encode($str);
}

function rev_lfsr($x) {
    $bit = $x & 0x800000;
    $x = $x ^ ($bit ? 0xe10000 : 0);
    return ($x << 1) + ($bit ? 1 : 0);
}
function from_4($str) {
    $str = base64_decode($str);
    $x = unpack("C*", $str);
    $x = $x[1]*65536 + $x[2] * 256 + $x[3];
    for($i=0;$i<24;$i++)
        $x = rev_lfsr($x);
    return $x;
}

for($i=0; $i<256; $i++) {
    $enc = to_4($i);
    echo $enc . " " . from_4($enc) . "\n";
}

Output:

AAAA 0
kgQB 1
5ggD 2
dAwC 3
DhAH 4
nBQG 5
6BgE 6
ehwF 7
HCAO 8
jiQP 9
+igN 10
aCwM 11
EjAJ 12
gDQI 13
9DgK 14
ZjwL 15
OEAc 16
qkQd 17
3kgf 18
TEwe 19
NlAb 20
pFQa 21
0FgY 22

...

Note: for URL replace + and / with - and _.

Note: although this works, for a simple scenario like yours it's probably easier to create a random filename, till it doesn't exist. nobody cares about the number of the entry.

查看更多
登录 后发表回答