Encoding strings to small sizes for QRCode generat

2019-05-29 18:46发布

I'm generating QR codes using strings that could very easily be longer in length then a QRCode could handle. I'm looking for suggestions on algorithms to encode these strings as small as possible, or a proof that the string cannot be shrunk any further.

Since I'm encoding a series of items, I can represent them using ID's and delineate them using pipes as in the following lookup table:

    function encodeLookUp(character){
        switch(character){
            case '0': return '0000';
            case '1': return '0001';
            case '2': return '0010';
            case '3': return '0011';
            case '4': return '0100';
            case '5': return '0101';
            case '6': return '0110';
            case '7': return '0111';
            case '8': return '1000';
            case '9': return '1001';
            case '|': return '1010';
            case ':': return '1011';
        }
        return false;
    }

Using this table I am already doing a base 16 encoding, therefore each 32 ascii character from the original string becomes half a character in the new string (effectively halving the length).

Starting String:  01251548|4654654:4465464 // ID1 | ID2 : ID3   demonstrates both pipes.
Bit String:  000000010010010100010101010010001010010001100101010001100101010010110100010001100101010001100100
Result String:  %H¤eFT´FTd // Half the length of the starting string.

Then this new ascii code, is translated according to QRCode specification.

EDIT: The most amount of characters currently encodable: 384

CLARIFICATION: Both ID numberic length, and the quantity of ID's or pipes is variable with a tendancy towards one. I am looking to be able to reduce this algorithm to contain on average the least amount of characters by the time its a 'result string'.

NOTE: The result string is only an ascii represenetaion of the binary string i've encoded with the data to conform with standard QRCode specifications and readers.

5条回答
太酷不给撩
2楼-- · 2019-05-29 19:11

QR codes already have special encoding modes that are optimized for digits, or just alphanumeric data. It would probably be easier to take advantage of these rather than invent a scheme.

If you're going to do something custom, I think you'll find it hard to beat something like gzip compression. Just gzip the bytes, encode the bytes in byte mode, and decompress on the other end.

查看更多
Emotional °昔
3楼-- · 2019-05-29 19:16

Using the function, you're going to loose a lot of space (since 4 bits are way too much storage for 12 combinations).

I'd start by looking at the maximum length possible for your IDs and find a suitable storage block.

If you are storing these items serially in a fixed count (say, 4 ids). You would need id_length*id_count at most, and you won't need to use any separators.

Edit: Again according to the number of IDs you want to write and their expected maximum length, there may be different types of encodings to compress it done. RLE (run length encoding) came to my mind.

查看更多
Fickle 薄情
4楼-- · 2019-05-29 19:20

As a start of an answer to my own question:

If I start with a string of numbers

I can parse that string for patterns and hold those patters in special symbols that are able to take up the other 4 spaces available in my Huffman tree.

EDIT: Example: staring string 12222345, ending string 12x345. Where x is a symbol that means 'repeat the last symbol 3 more times'

查看更多
可以哭但决不认输i
5楼-- · 2019-05-29 19:27

QR codes support a binary mode, and that's going to be the most efficient way for you to store your IDs. Either:

  1. Pick a length (in bytes) that is sufficient to store all your IDs, and encode the QR-code as a series of fixed-length integers. 4 bytes (32 bits) is a standard choice that ought to cover the likely range, or
  2. If you want to be able to encode a wide range of IDs, but expect most of the values to be small, use a variable-length encoding scheme. One example is to use the lowest 7 bits of each byte to store the integer, and the most significant bit to indicate if there are any further bytes.

Also note that QR codes can be a lot larger than 384 characters!

Edit: From your original question, though, it looks like you're encoding more than just a series of integers - you have at least two different types of delimiters. Where can they appear and in what circumstances? The encoding format is going to depend on those parameters.

查看更多
祖国的老花朵
6楼-- · 2019-05-29 19:32

If you have relatively non-random data, a Huffman encoding might be a good solution.

查看更多
登录 后发表回答