I am using Apache Phoenix 4.5.2 from Cloudera labs distribution, which is installed over CDH 5.4 cluster. Now I'm trying to use it from Pentaho BA 5.4 server with embedded Mondrian and SAIKU Plugin installed.
I'm planning to use is as aggregator for Pentaho Mondrian ROLAP engine. So I have imported about 65 millions facts into fact table via slightly customized Pentaho Data Integration(if someone's interested, I added UPSERT
to Table Output step, set Commit size
to -1
, set thin driver phoenix-<version>-query-server-thin-client.jar
url to point to Apache Query Server and enabled its autocommit in hbase-site.xml
via phoenix.connection.autoCommit
), and now I have about 400 rows in time dimension table.
The problem is that Mondrian generates queries assuming that the order of table does not matter. It generates Cartesian join with FROM
statement where dimension table come first and fact table comes last. If I change the order of tables, query works successfully.
This ends with Phoenix trying to cache 65 M rows table into memory, so I get org.apache.phoenix.join.MaxServerCacheSizeExceededException: Size of hash cache (104857626 bytes) exceeds the maximum allowed size (104857600 bytes)
.
Aside from building custom Mondrian which will place fact table first, is there any hint or index trick to force Phoenix iterate over facts table first, because for me it's no-brainer that it should iterate over 65M row table and hash join it with much smaller dimension table?
Exception stack trace:
Caused by: mondrian.olap.MondrianException: Mondrian Error:Internal error: Error while loading segment; sql=[select "DAYS"."DAY" as "c0", sum("account_transactions"."AMOUNT") as "m0" from "DAYS" as "DAYS", "account_transactions" as "account_transactions" where "account_transactions"."DATE" = "DAYS"."DATE" group by "DAYS"."DAY"]
at mondrian.resource.MondrianResource$_Def0.ex(MondrianResource.java:972)
at mondrian.olap.Util.newInternal(Util.java:2404)
at mondrian.olap.Util.newError(Util.java:2420)
at mondrian.rolap.SqlStatement.handle(SqlStatement.java:353)
at mondrian.rolap.SqlStatement.execute(SqlStatement.java:253)
at mondrian.rolap.RolapUtil.executeQuery(RolapUtil.java:350)
at mondrian.rolap.agg.SegmentLoader.createExecuteSql(SegmentLoader.java:625)
... 8 more
Caused by: java.sql.SQLException: Encountered exception in sub plan [0] execution.
at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:171)
at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:121)
at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:266)
at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:256)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:255)
at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1409)
at org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:208)
at mondrian.rolap.SqlStatement.execute(SqlStatement.java:200)
... 10 more
Caused by: org.apache.phoenix.join.MaxServerCacheSizeExceededException: Size of hash cache (104857626 bytes) exceeds the maximum allowed size (104857600 bytes)
at org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java:109)
at org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:82)
at org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:353)
at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:145)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
... 3 more