Does ignore option of Pyspark DataFrameWriter jdbc

2019-03-02 08:20发布

站内文章 / MySQL

19 0

闹够了就滚

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The Pyspark DataFrameWriter class has a jdbc function for writing a dataframe to sql. This function has an --ignore option that the documentation says will:

Silently ignore this operation if data already exists.

But will it ignore the entire transaction, or will it only ignore inserting the rows that are duplicates? What if I were to combine --ignore with the --append flag? Would the behavior change?

回答1:

mode("ingore") is just NOOP if table (or another sink) already exists and writing modes cannot be combined. If you're looking for something like INSERT IGNORE or INSERT INTO ... WHERE NOT EXISTS ... you'll have to do it manually, for example with mapPartitions.