How to select specific columns through Spack JDBC?

2019-07-28 16:00发布

问题:

Now I am using Spark to connect my oracle database. However, there is a column type named "TIMESTAMP WITH TIMEZONE",which is a specific column in Oracle. When I load data from the table than contain this type column, it will throw a error "java.sql.SQLException: Unsupported type -101".

Does anybody know how to load specific columns from a table? Then I can avoid to select the "TIMESTAMP WITH TIMEZONE" column. It will be better if someone can figure out the "java.sql.SQLException: Unsupported type -101" error. But I think it might be a bug of Spark.

My code is following, thanks a lot.

spark = SparkSession\
    .builder\
    .appName("TestSQL")\
    .getOrCreate()
orc = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:oracle:thin:xxx/xxx@IP:1521/database") \
    .option("dbtable", "xxx.xxx") \
    .load() 

回答1:

In options , you can pass the sql query in dbtable key. In sql query you can select the required columns.

For example:

final String dbTable =
        "(select emp_no, concat_ws(' ', first_name, last_name) as full_name from employees) as employees_name";

Dataset<Row> jdbcDF = 
        sparkSession.read().jdbc(CONNECTION_URL, dbTable, "emp_no", 10001, 499999, 10, connectionProperties);

*The code is in java

source : Loading database data using Spark 2.0 Data Sources API



回答2:

Another approach can be creating views on Oracle, so that you can handle Oracle specific data types within DB itself.