Does ignore option of Pyspark DataFrameWriter jdbc

2019-03-02 08:20发布

问题:

The Pyspark DataFrameWriter class has a jdbc function for writing a dataframe to sql. This function has an --ignore option that the documentation says will:

Silently ignore this operation if data already exists.

But will it ignore the entire transaction, or will it only ignore inserting the rows that are duplicates? What if I were to combine --ignore with the --append flag? Would the behavior change?

回答1:

mode("ingore") is just NOOP if table (or another sink) already exists and writing modes cannot be combined. If you're looking for something like INSERT IGNORE or INSERT INTO ... WHERE NOT EXISTS ... you'll have to do it manually, for example with mapPartitions.