MD5 Hashing a CSV with Python

2019-07-20 11:11发布

问题:

I have a csv with email addresses that needs to be hashed in MD5 format, then save the hashed emails as a new csv. I haven't seen my exact use case on SO and haven't been able to successfully modify existing questions.

Original file path is "/Users/[username]/Downloads/email_original.csv" and desired output file would be "/Users/[username]/Downloads/email_hashed.csv".

Original File

email_addr
fake_email1@yahoo.com
fake_email2@gmail.com
fake_email3@college.edu
fake_email4@hotmail.com
fake_email5@ford.com

Hashed File

email_addr
0x3731BF23851200A7607BA554EEAF7912
0xA5D5D3B99896D32BAC64162BD56BE177
0xAE03858BDFBDF622AF5A1852317500C3
0xC870F8D75180AC9DA2188129C910489B
0xD7AFD8085548808459BDEF8665C8D52A

回答1:

The answer in your comment is nearly correct. You only need to open another file with the write attribute w. I have changed your query to use with so you don't to have to explicitly close the file handlers:

with open("/Users/[username]/Downloads/email_original.csv",'rb')  as file:
    with open("/Users/[username]/Downloads/email_hashed.csv",'w')  as output:
        for line in file: 
           line=line.strip() 
           print hashlib.md5(line).hexdigest() 
           output.write(hashlib.md5(line).hexdigest() +'\n')


回答2:

Jaco's answer is good but incomplete since it neglects the encoding for the MD5 hash. The code would also be insufficient if the CSV format was modified to include other columns in the future. Here is an example that tackles both problems while also making easy to change the hash in the future along with specifying other columns that can have individual hash algorithms applied to them:

import csv
import hashlib

IN_PATH = 'email_original.csv'
OUT_PATH = 'email_hashed.csv'
ENCODING = 'ascii'
HASH_COLUMNS = dict(email_addr='md5')


def main():
    with open(IN_PATH, 'rt', encoding=ENCODING, newline='') as in_file, \
            open(OUT_PATH, 'wt', encoding=ENCODING, newline='') as out_file:
        reader = csv.DictReader(in_file)
        writer = csv.DictWriter(out_file, reader.fieldnames)
        writer.writeheader()
        for row in reader:
            for column, method in HASH_COLUMNS.items():
                data = row[column].encode(ENCODING)
                digest = hashlib.new(method, data).hexdigest()
                row[column] = '0x' + digest.upper()
            writer.writerow(row)

if __name__ == '__main__':
    main()


标签: python csv hash