-->

Easiest way to compress in Python and decompress w

2020-07-24 05:52发布

问题:

I have a program with a Mono-based C# client and a Python server, which communicate over a TCP/IP socket. The messages use a mostly binary format but the largest part of each message is usually embedded UTF-8 strings (in English). Each message is typically short (under 100 bytes) but some can be longer (up to 64K). A lot of data is exchanged and I'd like to reduce message sizes and bandwidth usage by compressing the data when it's transmitted.

My initial research hasn't turned up anything that is obviously compatible across the 2 standard libraries. Python has a zlib library but I can't use C#'s DeflateStream or GZipStream (as they require an external DLL that I don't have available) and it doesn't seem to work with SharpZipLib's ZipOutputStream (giving out "error -3 - incorrect header" responses). (Those not using Mono might have more luck - see Duncan's answer below.)

I would be interested in hearing about easy ways to enable compression over this communications link, bearing in mind that any solution that may be easy to implement in one language needs to have an equivalent in the other. I'd accept a solution that is specialised towards the UTF-8 strings rather than the binary messages although the preference would be to compress the whole byte stream.

Ideally I'd like to keep external dependencies to a minimum, but I realise that may not be practical.

UPDATE: Having tried with SharpZipLib and encountered repeated errors on the Python decoding side, I could really do with concrete suggestions with code that is known to work rather than just suggestions of compression libraries for one language or the other.

回答1:

The BZip2 from SharpZipLib and Python's library worked for me. Here's what I tested and how:

First, the C# program (referencing SharpZipLib):

using System;
using ICSharpCode.SharpZipLib.BZip2;
using System.IO;

namespace Test
{
    class MainClass
    {
        public static void Main(string[] args)
        {
            var fStream = new FileStream("/home/konrad/output.bin", FileMode.Create);
            using(var writer = new StreamWriter(new BZip2OutputStream(fStream)))
            {
                for(var i = 0; i < 10; i++)
                {
                    writer.WriteLine("Line no {0}.", i);
                }
            }
        }
    }

}

Then Python:

from bz2 import BZ2File
import sys

f = BZ2File("/home/konrad/output.bin")
for line in f.readlines():
    sys.stdout.write(line)

Next, C# program is ran. And after that:

$ python ctest.py
Line no 0.
Line no 1.
Line no 2.
Line no 3.
Line no 4.
Line no 5.
Line no 6.
Line no 7.
Line no 8.
Line no 9.

I assume it works the other way round as well.



回答2:

You wrote:

Similarly both standard libraries offer gzip compression but Python expects to use file in this case, which is not practical.

That's not actually true. Python's gzip.GZipFile() class takes either a filename or a fileobj. If you want to use a string just use a StringIO object as the fileobj:

from gzip import GzipFile
from StringIO import StringIO
sio = StringIO()
with GzipFile(fileobj=sio, mode='wb') as gzip:
    gzip.write('uncompressed data')
compressed = sio.getvalue()


回答3:

It seems you're on *nix systems. If that's the case and all the other methods failed, you can simply use system libraries (Mono.Unix.Native) and don't need to worry about finding proper .Net libraries.



回答4:

I have used zlib for .net in the past and there are also libraries that wrap the native zlib library to provide a managed solution. I needed to do something similar to what you are doing. I would do the compression directly in memory for smaller transfers and would zip to a file and then download the file from a url and unzip from file for much larger files.