Spark MySQL Error when Reading from Database

2019-08-23 04:26发布

问题:

I have the following piece of code:

Try {
  Context.sc.sqlContext.read.jdbc(url, tableName, prop)
} match {
  case Success(df) =>
    log.info(s"$tableName successfully read from $schemaName with this connection: $url")
    df
  case Failure(exception) =>
    log.error(s"$tableName >> $schemaName connection url >> $url")
    log.error(s"Error reading $tableName from $schemaName with this connection: $url :: Error Message is ${exception.getMessage}")
    throw exception
}

When I run this, I get the following Exception:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-analytics-d.t_dim_sap_manufacturers WHERE 1=0' at line 1
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
        at com.mysql.jdbc.Util.getInstance(Util.java:408)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:943)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2487)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
        at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:62)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
        at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
        at com.eon.adp.substations.factory.TableFactory$$anonfun$1.apply(TableFactory.scala:42)
        at com.eon.adp.substations.factory.TableFactory$$anonfun$1.apply(TableFactory.scala:42)
        at scala.util.Try$.apply(Try.scala:192)
....
....
....
....

As it can be seen that the error message:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-analytics-d.t_dim_sap_manufacturers WHERE 1=0' at line 1

the Schema name haw-analytics-d is truncated. I guess this has got something to do with the way MySQL produces the logs or?

When I tried to print the Schema and the table name that I pass in, I get them properly parsed by my application:

TableName = haw-analytics-d.t_dim_sap_manufacturers :: SchemaName haw-analytics-d :: connection url >> jdbc:mysql://my.datbase.server:3306/haw-analytics-d?useSSL=true&requireSSL=true

How can I see the SQL that the SparkContext is generating? Is there something in the API that I can tell to print the generated SQL statement to the console? Any ideas?

回答1:

the Schema name haw-analytics-d is truncated. I guess this has got something to do with the way MySQL produces the logs or?

I couldn't reproduce the problem with Spark 2.2.1 / MySQL Connector/J 5.1.45, so it looks like it is a bug in the specific combination you use, but yes - you're shooting yourself in the foot by using hyphenated names.

If for some reason you cannot update the components, you can try to remove database name from the URL:

jdbc:mysql://my.datbase.server:3306/?useSSL=true&requireSSL=true

and replace tableName with a query, escaping schema name with backticks:

val tableName = "(SELECT * FROM `haw-analytics-d`.t_dim_sap_manufacturers) AS t"