I want to encrypt a few columns of a Spark dataframe based on some condition. The below encrypt and decrypt function is working fine:
def EncryptDecrypt(Encrypt, str):
key = b'B5oRyf5Zs3P7atXIf-I5TaCeF3aM1NEILv3A7Zm93b4='
cipher_suite = Fernet(key)
if Encrypt is True:
a = bytes(str, "utf-8")
return cipher_suite.encrypt(bytes(a))
else:
return cipher_suite.decrypt(str)
Now, I want to iterate over specific dataframe column to encrypt it. If the encryption condition is satisfied, I have to iterate over that dataframe column.
if sqldf.filter(condition satistified).count() > 0:
iterate over that specific column to encrypt its data
I have to maintain dataframe column positions so can't add encrypted column at the end.
Please help me to iterate over dataframe rows and let me know if there is any other more optimize approach.
Below is the approach I am using (Edits)-
I am trying to call udf through spark sql but getting a = bytes(str, "utf-8")
TypeError: encoding without a string argument
error. Below code I am using to register udf and executing it using spark sql
spark.udf.register("my_udf", EncryptDecrypt, ByteType())
sqldf1 = spark.sql("Select " + my_udf(True, " + column + ") from df1")
column is the filed name.