Compute HIVE statistics in Apache Spark

2019-08-11 17:01发布

I'm trying to compute HIVE table statistic from Apache Spark:

`sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS')`

I also execute statement to see what was collected:

sqlCtx.sql('DESC FORMATTED t1')

I can see my stats was collected. However when I execute same staement in HIVE client (Ambari) - there are no statistics displayed. Is it available only to Spark if it's collected by Spark? Does spark store it somewhere else?

Another question.

I also computing stats for all columns in that table:

sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS FOR COLUMNS c1,c2')

But when I want to see this stats in spark, it failed with unsupported sql statement exception:

sqlCtx.sql('DESC FORMATTED t1 c1')

According to docs it's valid hive queries. What is wrong with it?

Thanks for help.

标签： apache-spark hive

2条回答

倾城　Initia

2楼-- · 2019-08-11 17:39

Apache Spark stores statistics as "Table parameters". To be able retrieve these stats, we need to connect to HIVE metastore and . execute query like following

select param_key, param_value 
from table_params tp, tbls t 
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>' 
and param_key like 'spark.sql.stat%';

0人赞添加讨论(0) 举报

何必那么认真

3楼-- · 2019-08-11 17:58

just uppercase the name of table will be ok.

select param_key, param_value 
from TABLE_PARAMS tp, TBLS t 
where tp.tbl_id=t.tbl_id and tbl_name = '<table_name>' 
and param_key like 'spark.sql.stat%';

0人赞添加讨论(0) 举报

Compute HIVE statistics in Apache Spark

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间