Is it possible to base64-encode a file in chunks?

2019-01-11 02:04发布

问题:

I'm trying to base64 encode a huge input file and end up with an text output file, and I'm trying to find out whether it's possible to encode the input file bit-by-bit, or whether I need to encode the entire thing at once.

This will be done on the AS/400 (iSeries), if that makes any difference. I'm using my own base64 encoding routine (written in RPG) which works excellently, and, were it not a case of size limitations, would be fine.

回答1:

It's not possible bit-by-bit but 3 bytes at a time, or multiples of 3 bytes at time will do!.

In other words if you split your input file in "chunks" which size(s) is (are) multiples of 3 bytes, you can encode the chunks separately and piece together the resulting B64-encoded pieces together (in the corresponding orde, of course. Note that the last chuink needn't be exactly a multiple of 3 bytes in size, depending on the modulo 3 value of its size its corresponding B64 value will have a few of these padding characters (typically the equal sign) but that's ok, as thiswill be the only piece that has (and needs) such padding.

In the decoding direction, it is the same idea except that you need to split the B64-encoded data in multiples of 4 bytes. Decode them in parallel / individually as desired and re-piece the original data by appending the decoded parts together (again in the same order).

Example:

"File" contents = "Never argue with the data." (Jimmy Neutron).
Straight encoding = Ik5ldmVyIGFyZ3VlIHdpdGggdGhlIGRhdGEuIiAoSmltbXkgTmV1dHJvbik=

Now, in chunks:
"Never argue     -->     Ik5ldmVyIGFyZ3Vl
with the         -->        IHdpdGggdGhl
data." (Jimmy Neutron) --> IGRhdGEuIiAoSmltbXkgTmV1dHJvbik=

As you see piece in that order the 3 encoded chunks amount the same as the code produced for the whole file.

Decoding is done similarly, with arbitrary chuncked sized provided they are multiples of 4 bytes. There is absolutely not need to have any kind of correspondance between the sizes used for encoding. (although standardizing to one single size for each direction (say 300 and 400) may makes things more uniform and easier to manage.



回答2:

It is a trivial effort to split any given bytestream into chunks.

You can base64 any chunk of bytes without problem.

The problem you are faced with is that unless you place specific requirements on your chunks (multiples of 3 bytes), the sequence of base64-encoded chunks will be different than the actual output you want.

In C#, this is one (sloppy) way you could do it lazily. The execution is actually deferred until string.Concat is called, so you can do anything you want with the chunked strings. (If you plug this into LINQPad you will see the output)

void Main()
{
    var data = "lorum ipsum etc lol this is an example!!";
    var bytes = Encoding.ASCII.GetBytes(data);
    var testFinal = Convert.ToBase64String(bytes);

    var chunkedBytes = bytes.Chunk(3);
    var base64chunks = chunkedBytes.Select(i => Convert.ToBase64String(i.ToArray()));
    var final = string.Concat(base64chunks);

    testFinal.Dump(); //output
    final.Dump(); //output
}
public static class Extensions
{
    public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> list, int chunkSize)
    {
        while(list.Take(1).Count() > 0)
        {
            yield return list.Take(chunkSize);
            list = list.Skip(chunkSize);
        }
    }
}

Output

bG9ydW0gaXBzdW0gZXRjIGxvbCB0aGlzIGlzIGFuIGV4YW1wbGUhIQ==
bG9ydW0gaXBzdW0gZXRjIGxvbCB0aGlzIGlzIGFuIGV4YW1wbGUhIQ==


回答3:

Hmmm, if you wrote the base64 conversion yourself you should have noticed the obvious thing: each sequence of 3 octets is represented by 4 characters in base64.

So you can split the base64 data at every multiple of four characters, and it will be possible to convert these chunks back to their original bits.

I don't know how character files and byte files are handled on an AS/400, but if it has both concepts, this should be very easy.

  • are text files limited in the length of each line?
  • are text files line-oriented, or are they just character streams?
  • how many bits does one byte have?
  • are byte files padded at the end, so that one can only create files that span whole disk sectors?

If you can answer all these questions, what exact difficulties do you have left?



标签: base64