I'm trying to compute HIVE table statistic from Apache Spark:
`sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS')`
I also execute statement to see what was collected:
sqlCtx.sql('DESC FORMATTED t1')
I can see my stats was collected. However when I execute same staement in HIVE client (Ambari) - there are no statistics displayed. Is it available only to Spark if it's collected by Spark? Does spark store it somewhere else?
Another question.
I also computing stats for all columns in that table:
sqlCtx.sql('ANALYZE TABLE t1 COMPUTE STATISTICS FOR COLUMNS c1,c2')
But when I want to see this stats in spark, it failed with unsupported sql statement exception:
sqlCtx.sql('DESC FORMATTED t1 c1')
According to docs it's valid hive queries. What is wrong with it?
Thanks for help.
Apache Spark stores statistics as "Table parameters". To be able retrieve these stats, we need to connect to HIVE metastore and . execute query like following
just uppercase the name of table will be ok.