WITH Clause in spark sql Context not supported

2019-07-19 04:32发布

问题:

I am trying to fetch records from mainframe table using spark sql context with the below query

data_config.db2_qry= SELECT A.E_No,A.E_Name FROM Scheema.Employee A WITH UR

but it is throwing the below error

com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-199, SQLSTATE=42601, SQLERRMC=WITH;HAVING WHERE GROUP ORDER INTERSECT MINUS EXCEPT UNION ) , FETCH, DRIVER=4.19.26

but if I run the same query in mainframe console directly it works fine.

How to use WITH clause in sql context of spark?

I am using spark version 2.4.0

I am retrieving the records like below

filt_cond = "(" + data_config.db2_qry + ") ref_id"

db2Df = sqlContext.read.format("jdbc").option("url", data_config.db2_url).option("driver", "com.ibm.db2.jcc.DB2Driver").option( "dbtable", filt_cond).option("user", data_config.db2_uname).option("password", data_config.db2_passwd).load()

回答1:

The issue is in query that sent down to Mainframe DB2, spark jdbc method choice used to push "WITH UR" needs change.

spark jdbc read method used here is

def jdbc(url: String, table: String, properties: Properties): DataFrame

and in this method say we pushing the following query to db2 sql engine

"select a, b, c, d from table where d is not null with UR as table" , it's not the same query pushed inside the Mainframe DB2 SQL engine. spark sends the sql as

select a, b, c from (select a, b, c from table where d is not null with UR) as table this is where trouble started.

if you want to see the same error for the sql in Mainframe SPUFI or QMF or with other tool, try running the constructed query by spark rather than what we wrote in code.

To overcome this issue on adding "WITH UR" syntax to SQL, instead of above spark jdbc method switch to following spark jdbc method that allows us to construct predicates.

 def jdbc(url: String, table: String, predicates: Array[String],
 connectionProperties: Properties): DataFrame

push the sql as ""select a, b, c, d from table as tbl"

with predicates= Array("d is not null with UR")

in this case the expected query is pushed down. Hope this helps you to get the direction to solve it.

here you can see more detail on spark jdbc read methods- Link