In case I am running a below hive query on table with certain partitioned column, I want to make sure hive does not do full table scan and just figure out the result from meta data itself. Is there any way to enable this ?
Select max(partitioned_col) from hive_table ;
Right now , when I am running this query , its launching map reduce tasks and I am sure its doing data scan while it can very well figure out the value from metadata itself.
Compute table statistics every time you changed data.
Enable CBO and statistics auto gathering:
Use these settings to enable CBO using statistics:
If nothing helps I'd recommend to apply this approach for finding last partition fast: Parse max partition key using shell script from the table location. The command below will print all table folder paths, sort, take latest sorted, take last subfolder name, parse partition folder name and extract value. All you need is to initialize
TABLE_DIR
variable and putthe number of partition subfolder in the path
:Then use
$last_partition
variable to pass to your script as