how to read orc transaction hive table in spark?
I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data
See complete scenario :
hive> create table default.Hello(id int,name string) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive> insert into default.hello values(10,'abc');
now I am trying to access Hive Orc data from Spark sql but it show only schema
spark.sql("select * from hello").show()
Output: id,name
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- Livy Server: return a dataframe as JSON?
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
You would need to add an action at the end to force it to run the query:
(The default here is to show 20 rows)
or
to see 2 rows of output data.
These are just examples of actions that can be taken on a DataFrame.
Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID
Please refer my answer for this issue
spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.
This compaction should make you able to see the data. (after some time the data is compacted)