I would like to read (in Python 2.7), line by line, from a csv (text) file, which is 7z compressed. I don't want to decompress the entire (large) file, but to stream the lines.
I tried pylzma.decompressobj()
unsuccessfully. I get a data error. Note that this code doesn't yet read line by line:
input_filename = r"testing.csv.7z"
with open(input_filename, 'rb') as infile:
obj = pylzma.decompressobj()
o = open('decompressed.raw', 'wb')
obj = pylzma.decompressobj()
while True:
tmp = infile.read(1)
if not tmp: break
o.write(obj.decompress(tmp))
o.close()
Output:
o.write(obj.decompress(tmp))
ValueError: data error during decompression
This will allow you to iterate the lines. It's partially derived from some code I found in an answer to another question.
At this point in time (
pylzma-0.5.0
) thepy7zlib
module doesn't implement an API that would allow archive members to be read as a stream of bytes or characters — itsArchiveFile
class only provides aread()
function that decompresses and returns the uncompressed data in a member all at once. Given that, about the best that can be done is return bytes or lines iteratively via a Python generator using that as a buffer.The following does the latter, but may not help if the problem is the archive member file itself is huge.
The code below should work in Python 3.x as well as 2.7.
If you were using Python 3.3+, you might be able to do this using the
lzma
module which was added to the standard library in that version.See:
lzma
Examples