I have asked similar question in R about creating hash value for each row of data. I know that I can use something like hashlib.md5(b'Hello World').hexdigest()
to hash a string, but how about a row in a dataframe?
update 01
I have drafted my code as below:
for index, row in course_staff_df.iterrows():
temp_df.loc[index,'hash'] = hashlib.md5(str(row[['cola','colb']].values)).hexdigest()
It seems not very pythonic to me, any better solution?
You can sum the hashes of all of the elements in the row:
A different method would be to coerce the row (a Series object) to a tuple:
To do so for every row, appended as a column would look like this:
If you'd rather hash the tuple of the row:
Or simply:
As an example: