A more compact representation than BASE64 for byte

2019-07-21 10:19发布

For debugging I often find it useful to visualize byte arrays (for example hashed passwords) as BASE64 strings.

        public override string ToString()
        {
            return Convert.ToBase64String(this.Hash);      
        }

But for large hashes (say more than 32 bytes) BASE64 encoding produces a string that is quite long. This makes it hard to compare them quickly by just looking at them.

BASE64 only uses 64 printable characters. I wonder if there are other encoding techniques that use more than 64 characters (but still only printable characters) to reduce the length needed to represent 32 bytes. It seems to me that we can greatly improve since on my keyboard I already see 94 easily distinguishable printable keys.

Of course, making byte arrays easily comparable by humans is not what BASE64 was originally intended for. But whatever works, right? ;)

1条回答
唯我独甜
2楼-- · 2019-07-21 11:05

You can use Ascii85. Wikipedia states:

Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size ¹⁄₄ larger than the original, assuming eight bits per ASCII character), it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data (¹⁄₃ increase, assuming eight bits per ASCII character).

You'll find a c# implementation on github which is written by Jeff Atwood and he accompanied that code with a post on his blog

As you only need the encoder part, I used Jeff's code as a start and created an implementation with only the encoding part:

class Ascii85
{

    private const int _asciiOffset = 33;
    private const int decodedBlockLength = 4;

    private byte[] _encodedBlock = new byte[5];
    private uint _tuple;

    /// <summary>
    /// Encodes binary data into a plaintext ASCII85 format string
    /// </summary>
    /// <param name="ba">binary data to encode</param>
    /// <returns>ASCII85 encoded string</returns>
    public string Encode(byte[] ba)
    {
        StringBuilder sb = new StringBuilder((int)(ba.Length * (_encodedBlock.Length / decodedBlockLength)));

        int count = 0;
        _tuple = 0;
        foreach (byte b in ba)
        {
            if (count >= decodedBlockLength - 1)
            {
                _tuple |= b;
                if (_tuple == 0)
                {
                    sb.Append('z');
                }
                else
                {
                    EncodeBlock(_encodedBlock.Length, sb);
                }
                _tuple = 0;
                count = 0;
            }
            else
            {
                _tuple |= (uint)(b << (24 - (count * 8)));
                count++;
            }
        }

        // if we have some bytes left over at the end..
        if (count > 0)
        {
            EncodeBlock(count + 1, sb);
        }

        return sb.ToString();
    }

    private void EncodeBlock(int count, StringBuilder sb)
    {
        for (int i = _encodedBlock.Length - 1; i >= 0; i--)
        {
            _encodedBlock[i] = (byte)((_tuple % 85) + _asciiOffset);
            _tuple /= 85;
        }

        for (int i = 0; i < count; i++)
        {
            sb.Append((char)_encodedBlock[i]);
        }

    }
}

And here is the required attribution:

/// <summary>
/// adapted from the Jeff Atwood code to only have the encoder
/// 
/// C# implementation of ASCII85 encoding. 
/// Based on C code from http://www.stillhq.com/cgi-bin/cvsweb/ascii85/
/// </summary>
/// <remarks>
/// Jeff Atwood
/// http://www.codinghorror.com/blog/archives/000410.html
/// </remarks>
查看更多
登录 后发表回答