I have a csv with email addresses that needs to be hashed in MD5 format, then save the hashed emails as a new csv. I haven't seen my exact use case on SO and haven't been able to successfully modify existing questions.
Original file path is "/Users/[username]/Downloads/email_original.csv"
and desired output file would be "/Users/[username]/Downloads/email_hashed.csv"
.
Original File
email_addr
fake_email1@yahoo.com
fake_email2@gmail.com
fake_email3@college.edu
fake_email4@hotmail.com
fake_email5@ford.com
Hashed File
email_addr
0x3731BF23851200A7607BA554EEAF7912
0xA5D5D3B99896D32BAC64162BD56BE177
0xAE03858BDFBDF622AF5A1852317500C3
0xC870F8D75180AC9DA2188129C910489B
0xD7AFD8085548808459BDEF8665C8D52A
The answer in your comment is nearly correct. You only need to open
another file with the write attribute w
. I have changed your query to use with
so you don't to have to explicitly close the file handlers:
with open("/Users/[username]/Downloads/email_original.csv",'rb') as file:
with open("/Users/[username]/Downloads/email_hashed.csv",'w') as output:
for line in file:
line=line.strip()
print hashlib.md5(line).hexdigest()
output.write(hashlib.md5(line).hexdigest() +'\n')
Jaco's answer is good but incomplete since it neglects the encoding for the MD5 hash. The code would also be insufficient if the CSV format was modified to include other columns in the future. Here is an example that tackles both problems while also making easy to change the hash in the future along with specifying other columns that can have individual hash algorithms applied to them:
import csv
import hashlib
IN_PATH = 'email_original.csv'
OUT_PATH = 'email_hashed.csv'
ENCODING = 'ascii'
HASH_COLUMNS = dict(email_addr='md5')
def main():
with open(IN_PATH, 'rt', encoding=ENCODING, newline='') as in_file, \
open(OUT_PATH, 'wt', encoding=ENCODING, newline='') as out_file:
reader = csv.DictReader(in_file)
writer = csv.DictWriter(out_file, reader.fieldnames)
writer.writeheader()
for row in reader:
for column, method in HASH_COLUMNS.items():
data = row[column].encode(ENCODING)
digest = hashlib.new(method, data).hexdigest()
row[column] = '0x' + digest.upper()
writer.writerow(row)
if __name__ == '__main__':
main()