Problems with Apache Kylin and Hive-JDBC-Storage-H

2019-09-10 11:49发布

问题:

Now Im using Hive-JDBC-Storage-Handler to create table on Hive:
Reference: https://github.com/qubole/Hive-JDBC-Storage-Handler

Hadoop Info:
- Hadoop 2.7.3 - Hive 1.2.1
Schema:

CREATE EXTERNAL TABLE tbl_google_stats_adgroups_summary
(
   campaignid                BIGINT COMMENT 'from deserializer',
   adgroupid                 BIGINT COMMENT 'from deserializer',
   localadgroupid            BIGINT COMMENT 'from deserializer',
   position                  FLOAT COMMENT 'from deserializer',
   cost                      FLOAT COMMENT 'from deserializer',
   impression                INT COMMENT 'from deserializer',
   clicks                    INT COMMENT 'from deserializer',
   conversions               INT COMMENT 'from deserializer',
   conversionsbydate         INT COMMENT 'from deserializer',
   uniqueconversions         INT COMMENT 'from deserializer',
   uniqueconversionsbydate   INT COMMENT 'from deserializer',
   datestats                 TIMESTAMP COMMENT 'from deserializer',
   quantity                  INT COMMENT 'from deserializer',
   quantitybydate            INT COMMENT 'from deserializer',
   revenue                   FLOAT COMMENT 'from deserializer',
   revenuebydate             FLOAT COMMENT 'from deserializer',
   uniquerevenue             FLOAT COMMENT 'from deserializer',
   uniquerevenuebydate       FLOAT COMMENT 'from deserializer',
   deviceid                  INT COMMENT 'from deserializer',
   conv1perclick             INT COMMENT 'from deserializer',
   adwordstype               INT COMMENT 'from deserializer'
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED       BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format' = '1' )  TBLPROPERTIES (  'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver' , 'mapred.jdbc.hive.lazy.split' = 'false' ,
 'mapred.jdbc.input.table.name' = 'tbl_adgroup' ,
 'mapred.jdbc.password' = '' , 'mapred.jdbc.url' = 'jdbc:mysql://localhost:3306/<databae_name>?useUnicode=true&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC' , 'mapred.jdbc.username' = 'root' )

Data is synced from Mysql well and then i used Apache Kylin to build a demo cube with this table. However the first state of building cube process: Create Intermediate Flat Hive Table was fail, and fail query was:

INSERT OVERWRITE TABLE default.kylin_intermediate_test_cube_adgroup_mysql_164b0ca3_6050_49bb_838b_49ee49f6d1e5 SELECT
    TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.CAMPAIGNID
    ,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.ADGROUPID
    ,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.POSITION
    ,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.DATESTATS
    ,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.DEVICEID
    ,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.ADWORDSTYPE
    ,TBL_GOOGLE_STATS_ADGROUPS_SUMMARY.COST
    FROM <database>.TBL_GOOGLE_STATS_ADGROUPS_SUMMARY as TBL_GOOGLE_STATS_ADGROUPS_SUMMARY ;

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1478248621961_0005_1_00, diagnostics=[Task failed, taskId=task_1478248621961_0005_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects. current reader class : class org.apache.hadoop.mapreduce.lib.db.MySQLDBRecordReader
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects. current reader class : class org.apache.hadoop.mapreduce.lib.db.MySQLDBRecordReader
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
	... 14 more

Anyone can help me? Beside i have a wondering about structure of hadoop system:

Now we need a transaction SQL Engine like MySQL because some out report data need to update data. Hive provides ACID table however it not support UPDATE INNER JOIN,... and this thing is really problem with our bussiness. So thats why i setup JDBCStorageHandler. So Can this structure handle billion rows of data? Thanks a lot!