The Pyspark DataFrameWriter
class has a jdbc
function for writing a dataframe to sql. This function has an --ignore
option that the documentation says will:
Silently ignore this operation if data already exists.
But will it ignore the entire transaction, or will it only ignore inserting the rows that are duplicates? What if I were to combine --ignore
with the --append
flag? Would the behavior change?