Spark JDBC returning dataframe only with column na

2019-05-10 15:20发布

I am trying to connect to a HiveTable using spark JDBC, with the following code:

val df = spark.read.format("jdbc").
  option("driver", "org.apache.hive.jdbc.HiveDriver").
  option("user","hive").
  option("password", "").
  option("url", jdbcUrl).
  option("dbTable", tableName).load()

df.show()

but the return I get is only an empty dataframe with modified columns name, like this:

--------------|---------------|
tableName.uuid|tableName.name |
--------------|---------------|

I've tried to read the dataframe in a lot of ways, but it always results the same. I'm using JDBC Hive Driver, and this HiveTable is located in an EMR cluster. The code also runs in the same cluster. Any help will be really appreciated. Thank you all.

1条回答
Ridiculous、
2楼-- · 2019-05-10 15:59

Please set fetchsize in option it should work.

Dataset<Row> referenceData
            = sparkSession.read()
            .option("fetchsize", "100")
            .format("jdbc")
            .option("url", jdbc.getJdbcURL())
            .option("user", "")
            .option("password", "")
            .option("dbtable", hiveTableName).load();
查看更多
登录 后发表回答