I have a giant file, let's call it one-csv-file.xz. It is an XZ-compressed CSV file.
How can I open and parse through the file without first decompressing it to disk? What if the file is, for example, 100 GB? Python cannot read all of that into memory at once, of course. Will it page or run out of memory?
You can iterate through an
LZMAFile
objectYou can decompress incrementally. See Compression using the LZMA Algorithm. You create an
LZMADecompressor
object, and then use thedecompress
method with successive chunks of the compressed data to get successive chunks of the uncompressed data.