Unable to to decompress large blobs in python

2019-08-25 01:31发布

问题:

I have been trying to use python to extract compressed blobs from Oracle and decompress them. But the bigger blobs do not get decompressed completely.

I have tried using dataframes and files to store the blobs and then decompress them but it doesn't convert the bigger blobs. Can memory be the possible issue? What changes can I try out?

I cannot share the blobs as its restricted data. And I dont have access to create test data.

I am using the below decompression code from git for decompressing which works perfectly for smaller blobs

https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py

Below is my sample code :

sql_string = """select 
event_id
,blob_length
,blob field

 from table"""

cur.execute(sql_string)
path = "P:/Folders/"

    for row in cur:
        print('de-BLOBbing {}..\n')
        filename = path +  "clinical_notes_" + str(row[0]) + "_" + str(row[1]) + ".txt"      
        filename1 = path1 +  "clinical_notes_" + str(row[0]) + "_" + str(row[1]) + ".txt"      
        f = open(filename, "wb")
        f.write(row[3].read())
        f.close()
        h = html2text.HTML2Text()
        h.ignore_links=True
        blobbytes = row[3].read()
        f2 = h.handle(striprtf(decompress_without_eoi(blobbytes)))
        f1 = codecs.open(filename1, encoding='utf-8', mode='wb+')
        f1.write(f2)
        f1.close()

Also in case if I put them in data frames below is what it shows me in regards to structure and memory usage where is_blob is the blob field.