Using predicates in spark jdbc read

2019-05-15 05:00发布

问题:

I am pulling data from sql server to hdfs. Here is my snippet for that,

val predicates = Array[String]("int_id < 500000", "int_id >= 500000 && int_id < 1000000")

  val jdbcDF = spark.read.format("jdbc")
      .option("url", dbUrl)
      .option("databaseName", "DatabaseName")
      .option("dbtable", table)
      .option("user", "***")
      .option("password", "***")
      .option("predicates", predicates)
      .load()

My Intellij IDE keeps saying that

"Type mismatch, expected Boolean or Long or Double or String, Actual : Array[String]"

in predicates. Not sure whats wrong with this. Can anyone see whats wrong with this? Also how do I use fetch size here?

Thanks.

回答1:

To option method accepts only Booleans, Longs, Doubles or Strings. To pass the predicates as an Array[String] you have to use the jdbc method instead of specifying it in the format method.

val predicates = Array[String]("int_id < 500000", "int_id >= 500000 && int_id < 1000000")

val jdbcDF = spark.read.jdbc(
  url = dbUrl,
  table = table,
  predicates = predicates,
  connectionProperties = new Properties(???) // user, pass, db, etc.
)

You can see an example here.