External Table not getting updated from parquet fi

2019-07-18 08:36发布

站内文章 / Spark

47 0

混吃等死

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am using spark streaming to write the aggregated output as parquet files to the hdfs using SaveMode.Append. I have an external table created like :

CREATE TABLE if not exists rolluptable
USING org.apache.spark.sql.parquet
OPTIONS (
  path "hdfs:////"
);

I had an impression that in case of external table the queries should fetch the data from newly parquet added files also. But, seems like the newly written files are not being picked up.

Dropping and recreating the table every time works fine but not a solution.

Please suggest how can my table have the data from newer files also.

回答1:

Are you reading those tables with spark? if so, spark caches parquet tables metadata (since schema discovery can be expensive)

To overcome this, you have 2 options:

Set the config spark.sql.parquet.cacheMetadata to false
refresh the table before the query: sqlContext.refreshTable("my_table")

See here for more details: http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion

标签： apache-spark hive apache-spark-sql parquet

混吃等死

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

External Table not getting updated from parquet fi

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮