For Python3, I followed @Martijn Pieters's code with this:
import gzip
import json
# writing
with gzip.GzipFile(jsonfilename, 'w') as fout:
for i in range(N):
uid = "whatever%i" % i
dv = [1, 2, 3]
data = json.dumps({
'what': uid,
'where': dv})
fout.write(data + '\n')
but this results in an error:
Traceback (most recent call last):
...
File "C:\Users\Think\my_json.py", line 118, in write_json
fout.write(data + '\n')
File "C:\Users\Think\Anaconda3\lib\gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Any thoughts about what is going on?
You have four steps of transformation here.
- a Python data structure (nested dicts, lists, strings, numbers, booleans)
- a Python string containing a serialized representation of that data structure ("JSON")
- a list of bytes containing a representation of that string ("UTF-8")
- a list of bytes containing a representation of that previous byte list ("gzip")
So let's take these steps one by one.
import gzip
import json
data = []
for i in range(N):
uid = "whatever%i" % i
dv = [1, 2, 3]
data.append({
'what': uid,
'where': dv
}) # 1. data
json_str = json.dumps(data) + "\n" # 2. string (i.e. JSON)
json_bytes = json_str.encode('utf-8') # 3. bytes (i.e. UTF-8)
with gzip.GzipFile(jsonfilename, 'w') as fout: # 4. gzip
fout.write(json_bytes)
Note that adding "\n"
is completely superfluous here. It does not break anything, but beyond that it has no use.
Reading works exactly the other way around:
with gzip.GzipFile(jsonfilename, 'r') as fin: # 4. gzip
json_bytes = fin.read() # 3. bytes (i.e. UTF-8)
json_str = json_bytes.decode('utf-8') # 2. string (i.e. JSON)
data = json.loads(json_str) # 1. data
print(data)
Of course the steps can be combined:
with gzip.GzipFile(jsonfilename, 'w') as fout:
fout.write(json.dumps(data).encode('utf-8'))
and
with gzip.GzipFile(jsonfilename, 'r') as fin:
data = json.loads(fin.read().decode('utf-8'))