I am running hive query as below
Select count(*),group_name from table_name group by group_name;
Status: Running (Executing on YARN cluster with App id XXXX)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 54 54 0 0 0 0
Reducer 2 ...... SUCCEEDED 13 13 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 24.93 s
--------------------------------------------------------------------------------
OK
Result
Time taken: 26.786 seconds, Fetched: 10 row(s)
The above timings look accurate when there is map reduce involved. But when I am running a simple query as below
select group_name from table_name
Time taken: 0.771 seconds, Fetched: 14 row(s)
The time above is not correct.
Also any idea how to measure query time more accurately will be greatly appreciated.
Thanks in advance
Measure time from shell script. There is
time
command.Call your
hive
command like this:time command outputs three times:
real
,user
andsys
Real is what you probably want to know. Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
See also this question: How do I get just real time value from 'time' command?