Below are the last 2 lines of the PySpark ETL code:
df_writer = DataFrameWriter(usage_fact)
df_writer.partitionBy("data_date", "data_product").saveAsTable(usageWideFactTable, format=fileFormat,mode=writeMode,path=usageWideFactpath)
Where, WriteMode= append and fileFormat=orc
I wanted to use insert overwrite in place of this so that my data is not getting appended when I re-run the code. Hence I have used this:
fact = spark.sql("insert overwrite table " + usageWideFactTable + " partition (data_date, data_product) select * from usage_fact")
But this is giving me below error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/", line 545, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/", line 1133, in __call__
File "/usr/lib/spark/python/pyspark/sql/", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u'Cannot overwrite a path that is also being read from.;'
Looks like I cannot overwrite a path from where I am reading from but don't know how to rectify it as I am new to PySpark. What exact code I should use so that this issue is removed?