I am trying to read a gunzipped file (.gz) in python and am having some trouble.
I used the gzip module to read it but the file is encoded as a utf-8 text file so eventually it reads an invalid character and crashes.
Does anyone know how to read gzip files encoded as utf-8 files? I know that there's a codecs module that can help but I can't understand how to use it.
Thanks!
import string
import gzip
import codecs
f = gzip.open('file.gz','r')
engines = {}
line = f.readline()
while line:
parsed = string.split(line, u'\u0001')
#do some things...
line = f.readline()
for en in engines:
print(en)
I don't see why this should be so hard.
What are you doing exactly? Please explain "eventually it reads an invalid character".
It should be as simple as:
EDITED
This answer works for
Python2
inPython3
, please see @SeppoEnarvi 's answer at https://stackoverflow.com/a/19794943/610569 (it uses thert
mode forgzip.open
.This is possible in Python 3.3:
Notice that gzip.open() requires you to explicitly specify text mode ('t').
Maybe
In pythonic form (2.5 or greater)
The above produced tons of decoding errors. I used this: