Programm Sketch
- I create a HiveContext
hiveContext
. - With that context, I create a DataFrame
df
from a JDBC relational table. - I register the DataFrame
df
viadf.registerTempTable("TESTTABLE")
. - I start a HiveThriftServer2 via
HiveThriftServer2.startWithContext(hiveContext)
.
The TESTTABLE contains 1,000,000 entries, columns are ID (INT) and NAME (VARCHAR)
+-----+--------+
| ID | NAME |
+-----+--------+
| 1 | Hello |
| 2 | Hello |
| 3 | Hello |
| ... | ... |
With Beeline I access the SQL Endpoint (at port 10000) of the HiveThriftServer and perform a query. E.g.
SELECT * FROM TESTTABLE WHERE ID='3'
When I inspect the QueryLog of the DB with the SQL Statements executed I see
/*SQL #:1000000 t:657*/ SELECT \"ID\",\"NAME\" FROM test;
So there happens no predicate pushdown , as the where clause is missing.
Questions
This gives raise to the following questions:
- Why is no predicate pushdown performed?
- Can this be changed by not using registerTempTable?
- If so, how? Or is this a known restriction of the HiveThriftServer?
Counterexample
If I create a DataFrame df
in Spark SQLContext and call
df.filter( df("ID") === 3).show()
I observe
/*SQL #:1*/SELECT \"ID\",\"NAME\" FROM test WHERE ID = 3;
as expected.