Apache Spark SQL issue : java.lang.RuntimeExceptio

2019-09-02 20:57发布

问题:

As per the my investigation on spark sql, come to know that more than 2 tables can't be joined directly, we have to use sub query to make it work. So I am using sub query and able to join 3 tables :

with following query :

"SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn";

But when in the same pattern, i am trying to join 4 tables, It is throwing me following exception

java.lang.RuntimeException: [1.517] failure: identifier expected

Query to join 4 tables:

SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn

can anyone please help me making this query work?

Thanks in advance. Error is as follow:

09/02/2015 02:55:24 [ERROR] org.apache.spark.Logging$class: Error running job streaming job 1423479307000 ms.0
java.lang.RuntimeException: [1.517] failure: identifier expected

 SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ^
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
        at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:73)
        at org.apache.spark.sql.api.java.JavaSQLContext.sql(JavaSQLContext.scala:49)
        at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:596)
        at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:546)
        at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
        at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

回答1:

The exception occurred due to you have used the reserved keyword "inner" of Spark in your sql. Avoid using of Keywords in Spark SQL as custom identifier.