how to read orc transaction hive table in spark?

2019-07-29 11:51发布

1. how to read orc transaction hive table in spark?
  
  I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data
  
  See complete scenario :
  
  hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
  
  hive> insert into default.hello values(10,'abc');
  
  now I am trying to access Hive Orc data from Spark sql but it show only schema
  
  spark.sql("select * from hello").show()
  
  Output: id,name

标签： apache-spark hive apache-spark-sql orc

3条回答

贪生不怕死

2楼-- · 2019-07-29 12:08

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2019-07-29 12:12

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

0人赞添加讨论(0) 举报

时光不老，我们不散

4楼-- · 2019-07-29 12:19

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

0人赞添加讨论(0) 举报

how to read orc transaction hive table in spark?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间