Decoding a single column in a CSV file using pytho

2020-05-03 16:33发布

问题:

I am very new and inexperienced to Python but I hope someone can help me with this. I didn't find any (understandable?) answers on google.

I have a large (10gb) CSV file that contains multiple columns. All columns are "normal" human readable text except for one column. This column is binary. I would like to decode this and write it the decoded data back into the CSV file.

This is what I got so far, but I have a feeling I'm way off. Any help would be appreciated!

import base64
import pandas as pd



df = pd.read_csv('sample.csv', delimiter=';',
                 usecols=[3], dtype=object, header=None,)
decoded_binary_data = base64.b64decode(df)

print(decoded_binary_data)

sample of CSV:

"5f8ebfd8-7d12-4659-a416-e5dcbe056d0a";"6";"1";**ez??R?+??a)???
Cs**;0;0;0;74;1720;
  • EDIT cleaned up the CSV file a bit.
  • EDIT added sample dataframe

sample of dataframe:

0                                       ez??R?+??a)???Cs
1                       B?t?a?h?kwd?W-]\???fc?m[m?A}??? 
2                       ?eE????3r??c??T????fc?m[m?A}??? 
3                       ?eE????3r??c??T????fc?m[m?A}??? 
4                       ?eE????3r??c??T????fc?m[m?A}??? 
5                       B?t?a?h?kwd?W-]\???fc?m[m?A}??? 

回答1:

You can simply use:

bs64 = lambda x: base64.b64decode(x)

decoded_binary_data = df['col_name'].apply(bs64)

See this page: https://chrisalbon.com/python/pandas_apply_operations_to_dataframes.html