I am trying to understand hive
in terms of architecture, and I am referring to Tom White's book on Hadoop.
I came across the following terms in regards to hive: Hive Services
, hiveserver2
, metastore
among others.
Referring to below diagrams from the Book (Hadoop: The definitive Guide).
Hive Architecture:
MetaStore configuration:
Hive Architecture which shows what "Driver" is:
I am not able to understand the following:
1) What is Hive Services
in Hive architecture diagram? Is it same when we say hiveserver2
?
2) What is Driver
in Hive architecture diagram?
3) What is MetaStore
(I am NOT referring to Metastore Database). Is it some process which runs? If so, is this part of hiveserver2
? As per the diagram MetaStore
can be remote, so if this is a JVM process, to which component it belongs to?
4) It say Hive service JVM
, MetaStore JVM Server
. But, where do these components gets installed? Are they part of the "server" side of "hive"?
5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".
Can anyone help understand this?
Hive Services
Driver
The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver
Metastore Server
Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).
Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby
See Hive Wiki
Metastore JVM
The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.
I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?
It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.
HiveServer1 is deprecated. Feel free to read about it, but don't use it.
My understanding is:
Hive Services includes: HS2(may call thrift server sometimes)、Driver, Compiler, Execution Engine. But these four component(HS2、Driver, Compiler, Execution Engine) are all in hiverserver2 process. So in hive, there are three processes: