I wrote a DataFrame with pySpark into HDFS with this command:
df.repartition(col("year"))\
.write.option("maxRecordsPerFile", 1000000)\
.parquet('/path/tablename', mode='overwrite', partitionBy=["year"], compression='snappy')
When taking a look into the HDFS I can see that the files are properly laying there. Anyhow, when I try to read the table with HIVE or Impala, the table cannot be found.
Whats going wrong here, am I missing something?
Interestingly, df.write.format('parquet').saveAsTable("tablename")
works properly.