I am new to Spark SQL but aware of hive query execution framework. I would like to understand how does spark executes sql queries (technical description) ?
If I fire below command
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("select count(distinct(id)) from test.emp").collect
In Hive it will be converted into Map-Reduce job but how it gets executed in Spark?
How hive metastore will come into picture?
Thanks in advance.
To answer you question briefly: No, HiveContext will not start MR job. Your SQL query will still use the spark engine
I will quote from the spark documentation:
So The HiveContext is used by spark to enhance the query parsing and accessing to existing Hive tables, and even to persist your result DataFrames / Tables. Actually, moreover, Hive could use Spark as its execution engine instead of using MR or tez.
Hive metastore is a metadata about Hive tables. And when using HiveContext spark could use this metastore service. Refer to the documentation: http://spark.apache.org/docs/latest/sql-programming-guide.html