Spark JDBC returning dataframe only with column na

2019-05-10 15:48发布

问题:

I am trying to connect to a HiveTable using spark JDBC, with the following code:

val df = spark.read.format("jdbc").
  option("driver", "org.apache.hive.jdbc.HiveDriver").
  option("user","hive").
  option("password", "").
  option("url", jdbcUrl).
  option("dbTable", tableName).load()

df.show()

but the return I get is only an empty dataframe with modified columns name, like this:

--------------|---------------|
tableName.uuid|tableName.name |
--------------|---------------|

I've tried to read the dataframe in a lot of ways, but it always results the same. I'm using JDBC Hive Driver, and this HiveTable is located in an EMR cluster. The code also runs in the same cluster. Any help will be really appreciated. Thank you all.

回答1:

Please set fetchsize in option it should work.

Dataset<Row> referenceData
            = sparkSession.read()
            .option("fetchsize", "100")
            .format("jdbc")
            .option("url", jdbc.getJdbcURL())
            .option("user", "")
            .option("password", "")
            .option("dbtable", hiveTableName).load();