how to read orc transaction hive table in spark?

2019-07-29 11:51发布

    1. how to read orc transaction hive table in spark?

      I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

      See complete scenario :

      hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');

      hive> insert into default.hello values(10,'abc');

      now I am trying to access Hive Orc data from Spark sql but it show only schema

      spark.sql("select * from hello").show()

      Output: id,name

3条回答
贪生不怕死
2楼-- · 2019-07-29 12:08

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

查看更多
孤傲高冷的网名
3楼-- · 2019-07-29 12:12

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

查看更多
时光不老,我们不散
4楼-- · 2019-07-29 12:19

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

查看更多
登录 后发表回答