Hash each row of pandas dataframe column using app

2019-07-22 09:45发布

问题:

I'm trying to hash each value of a python 3.6 pandas dataframe column with the following algorithm on the dataframe-column ORIG:

HK_ORIG = base64.b64encode(hashlib.sha1(str(df.ORIG).encode("UTF-8")).digest())

However, the above mentioned code does not hash each value of the column, so, in order to hash each value of the df-column ORIG, I need to use the apply function. Unfortunatelly, I don't seem to be good enough to get this done.

I imagine it to look like the following code:

df["HK_ORIG"] = str(df['ORIG']).encode("UTF-8")).apply(hashlib.sha1)

I'm looking very much forward to your answers! Many thanks in advance!

回答1:

You can either create a named function and apply it - or apply a lambda function. In either case, do as much processing as possible withing the dataframe.

A lambda-based solution:

df['ORIG'].astype(str).str.encode('UTF-8')\
          .apply(lambda x: base64.b64encode(hashlib.sha1(x).digest()))

A named function solution:

def hashme(x):
    return base64.b64encode(hashlib.sha1(x).digest())
df['ORIG'].astype(str).str.encode('UTF-8')\
          .apply(hashme)