I have been trying to use python to extract compressed blobs from Oracle and decompress them. But the bigger blobs do not get decompressed completely.
I have tried using dataframes and files to store the blobs and then decompress them but it doesn't convert the bigger blobs. Can memory be the possible issue? What changes can I try out?
I cannot share the blobs as its restricted data. And I dont have access to create test data.
I am using the below decompression code from git for decompressing which works perfectly for smaller blobs
https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py
Below is my sample code :
sql_string = """select
event_id
,blob_length
,blob field
from table"""
cur.execute(sql_string)
path = "P:/Folders/"
for row in cur:
print('de-BLOBbing {}..\n')
filename = path + "clinical_notes_" + str(row[0]) + "_" + str(row[1]) + ".txt"
filename1 = path1 + "clinical_notes_" + str(row[0]) + "_" + str(row[1]) + ".txt"
f = open(filename, "wb")
f.write(row[3].read())
f.close()
h = html2text.HTML2Text()
h.ignore_links=True
blobbytes = row[3].read()
f2 = h.handle(striprtf(decompress_without_eoi(blobbytes)))
f1 = codecs.open(filename1, encoding='utf-8', mode='wb+')
f1.write(f2)
f1.close()
Also in case if I put them in data frames below is what it shows me in regards to structure and memory usage where is_blob is the blob field.