How HiveContext of spark internally works?

2020-07-18 10:49发布

I am new to Spark.I found using HiveContext we can connect to hive and run HiveQLs. I run it and it worked.

My doubt is whether Spark does it through spark jobs .That is, it uses HiveContext only for accessing corresponding hive table files from HDFS

It internally calls hive to execute the query?

标签： hadoop apache-spark-sql

1条回答

在下西门庆

2楼-- · 2020-07-18 11:10

No, Spark doesn't call the hive to execute query. Spark only reads the metadata from hive and executes the query within Spark engine. Spark has it's own SQL execution engine which includes components such as catalyst, tungsten to optimize queries and give faster results. It uses meta data from hive and execution engine of spark to run the queries.

One of the greatest advantages of Hive is it's metastore. It acts as a single meta store for lot of components in hadoop eco system.

Coming to your question, when you use HiveContext, it will get access to metastore db and all your Hive Meta Data, which can clearly explain what type of data you have , where do you have the data , serialization and deserializations, compression codecs, columns, datatypes and literally every detail about the table and it's data. That is enough for spark to understand the data.

Overall, Spark only needs metastore which gives complete details of underlying data and once it has the metadata, it will execute the queries that you asked for, over its on execution engine. Hive is slower than Spark as it uses MapReduce. So, there is no point in going back to hive and ask to run it in hive.

Let me know if it answers ur question.

0人赞添加讨论(0) 举报

How HiveContext of spark internally works?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间