Convert byte array into any base

2019-04-04 18:21发布

I have an array of bytes (any length), and I want to encode this array into string using my own base encoder. In .NET is standard Base64 encoder, but what if I want to encode the array in Base62, Base53 or Base13?

Is it even possible to create such universal base encoder?

I know I could do it the simple way, that is, for each byte reserve fixed number of chars (in case of Base62, that would be 5 chars), and do direct byte->chars encoding, but I would be wasting space, as 5 Base62 chars are able to contain more than 1 byte, but less than 2 bytes.

How should I write such an encoder? Or is there already some class for this?
And please note that I need universal decoder as well, otherwise this is useless to me.

Resources

As the solution is already known (use BigInteger), I would just like to put here some resources relating the BigInteger class, as it is not available in .NET 3.5:

Big integers in C#
http://intx.codeplex.com/
https://svn.apache.org/repos/asf/incubator/heraldry/libraries/csharp/openid/trunk/Mono/Mono.Math/BigInteger.cs
http://www.codeproject.com/KB/cs/BigInteger_Library.aspx
http://www.codeproject.com/KB/cs/biginteger.aspx

8条回答
闹够了就滚
2楼-- · 2019-04-04 18:29

BASE64 works well, because 64 is a power of 2 (2^6) so each character holds 6 bits of data, and 3 bytes (3 * 8 = 24 bits) can be encoded into 4 characters (4 * 6 = 24). The encoding & decoding can be down merely bit shifting bits.

For bases which do not align with a power of 2 (like your base 62 or Base 53), Then you must treat the message you are trying to encode as one long number and perform divison and modulo operations on it. You'd probably be better off using a Base32 encoding and squandering a bit of bandwidth.

查看更多
闹够了就滚
3楼-- · 2019-04-04 18:30

Another example to look at is Ascii85, used in Adobe PostScript and PDF documents. In Ascii85, 5 characters are used to encode 4 bytes. You can figure out the efficiency of this coding as (256^4)/(85^5) = 96.8%. This is the fraction of bit combinations that will actually be used.

So, for whatever new base you would want to use to encode your data, you want to look for a power that will get it just above a power of 256 if you're trying to maximize coding efficiency. This might not be easy for every base. Checking base 53 shows that the best you'll probably get is using 7 bytes to encode 5 bytes (93.6% efficiency), unless you feel like using 88 bytes to encode 63 bytes.

查看更多
SAY GOODBYE
4楼-- · 2019-04-04 18:35

A little late to the party, but...

Because your specification calls for an arbitrary number of bits, you must have an integer type that can work with an arbitrary number of bits. If you can't target .NET 4.0 you'll have to beg, borrow, or steal a BigInteger implementation somewhere (like .NET 4.0 perhaps).

public static class GenericBaseConverter
{
    public static string ConvertToString(byte[] valueAsArray, string digits, int pad)
    {
        if (digits == null)
            throw new ArgumentNullException("digits");
        if (digits.Length < 2)
            throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");

        BigInteger value = new BigInteger(valueAsArray);
        bool isNeg = value < 0;
        value = isNeg ? -value : value;

        StringBuilder sb = new StringBuilder(pad + (isNeg ? 1 : 0));

        do
        {
            BigInteger rem;
            value = BigInteger.DivRem(value, digits.Length, out rem);
            sb.Append(digits[(int)rem]);
        } while (value > 0);

        // pad it
        if (sb.Length < pad)
            sb.Append(digits[0], pad - sb.Length);

        // if the number is negative, add the sign.
        if (isNeg)
            sb.Append('-');

        // reverse it
        for (int i = 0, j = sb.Length - 1; i < j; i++, j--)
        {
            char t = sb[i];
            sb[i] = sb[j];
            sb[j] = t;
        }

        return sb.ToString();

    }

    public static BigInteger ConvertFromString(string s, string digits)
    {
        BigInteger result;

        switch (Parse(s, digits, out result))
        {
            case ParseCode.FormatError:
                throw new FormatException("Input string was not in the correct format.");
            case ParseCode.NullString:
                throw new ArgumentNullException("s");
            case ParseCode.NullDigits:
                throw new ArgumentNullException("digits");
            case ParseCode.InsufficientDigits:
                throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");
            case ParseCode.Overflow:
                throw new OverflowException();
        }

        return result;
    }

    public static bool TryConvertFromString(string s, string digits, out BigInteger result)
    {
        return Parse(s, digits, out result) == ParseCode.Success;
    }

    private enum ParseCode
    {
        Success,
        NullString,
        NullDigits,
        InsufficientDigits,
        Overflow,
        FormatError,
    }

    private static ParseCode Parse(string s, string digits, out BigInteger result)
    {
        result = 0;

        if (s == null)
            return ParseCode.NullString;
        if (digits == null)
            return ParseCode.NullDigits;
        if (digits.Length < 2)
            return ParseCode.InsufficientDigits;

        // skip leading white space
        int i = 0;
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;
        if (i >= s.Length)
            return ParseCode.FormatError;

        // get the sign if it's there.
        BigInteger sign = 1;
        if (s[i] == '+')
            ++i;
        else if (s[i] == '-')
        {
            ++i;
            sign = -1;
        }

        // Make sure there's at least one digit
        if (i >= s.Length)
            return ParseCode.FormatError;


        // Parse the digits.
        while (i < s.Length)
        {
            int n = digits.IndexOf(s[i]);
            if (n < 0)
                return ParseCode.FormatError;
            BigInteger oldResult = result;
            result = unchecked((result * digits.Length) + n);
            if (result < oldResult)
                return ParseCode.Overflow;

            ++i;
        }

        // skip trailing white space
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;

        // and make sure there's nothing else.
        if (i < s.Length)
            return ParseCode.FormatError;

        if (sign < 0)
            result = -result;

        return ParseCode.Success;
    }
}
查看更多
Viruses.
5楼-- · 2019-04-04 18:43

If performance is not an issue, use the BigInteger class in the background. You have a constructor for BigInteger that takes byte array, and you can then manually run loops of division and modulus to get the representation in other non-standard bases.

Also take a look at this.

查看更多
淡お忘
6楼-- · 2019-04-04 18:46

A post on CodeReview prompted me to create a RadixEncoding class which is able to handle encoding/decoding a byte array to/from a base-N string.

The class can be found in this Q&A thread, along with documentation on (and solutions for) a few edge cases when dealing with BigInteger, endian-ness support, and the class' overall performance

查看更多
啃猪蹄的小仙女
7楼-- · 2019-04-04 18:48

You can get inspiration from C# implementation of Base32 implementation by Michael Giagnocavo.

查看更多
登录 后发表回答