Iterate Spark dataframe specific column

2019-08-27 18:30发布

问题:

I want to encrypt a few columns of a Spark dataframe based on some condition. The below encrypt and decrypt function is working fine:

def EncryptDecrypt(Encrypt, str):
    key = b'B5oRyf5Zs3P7atXIf-I5TaCeF3aM1NEILv3A7Zm93b4='
    cipher_suite = Fernet(key)
    if Encrypt is True:
        a = bytes(str, "utf-8")
        return cipher_suite.encrypt(bytes(a))
    else:
        return cipher_suite.decrypt(str)

Now, I want to iterate over specific dataframe column to encrypt it. If the encryption condition is satisfied, I have to iterate over that dataframe column.

if sqldf.filter(condition satistified).count() > 0:
    iterate over that specific column to encrypt its data

I have to maintain dataframe column positions so can't add encrypted column at the end.

Please help me to iterate over dataframe rows and let me know if there is any other more optimize approach.


Below is the approach I am using (Edits)-

I am trying to call udf through spark sql but getting a = bytes(str, "utf-8") TypeError: encoding without a string argument error. Below code I am using to register udf and executing it using spark sql

spark.udf.register("my_udf", EncryptDecrypt, ByteType())
sqldf1 = spark.sql("Select " + my_udf(True, " + column + ") from df1")

column is the filed name.