How to get SQL database connection compatible with

2019-08-21 10:03发布

问题:

I have an R script that uses odbc::dbConnect to connect to an SQL database (some databases are Azure, some are on-premise but connected to the Azure VPNs via the company's network, though I don't have any understanding of the network infrastructure itself) and then uses DBI::dbGetQuery to run a series of fairly complicated SQL queries and store the results as R dataframes which can be manipulated and fed into my models.

Because of insufficient memory on my local PC to run the script, I am having to transfer the script to a Databricks notebook and run it on a cluster with a more powerful driver node. However I am running out of time and am not able or willing to completely rewrite everything to be sparkR/sparklyr compatible or parallelisable; I just want to run my standard R script as closely to the script I have already written as possible with minimal edits for compatibility with Spark.

I am aware that the odbc, RODBC and RJDBC packages do not work on Databricks notebooks. I have looked into using SparkR::read.jdbc and Sparklyr::spark_read_jdbc() but these directly read from a jdbc source into a SparkR/sparklyr dataframe rather than creating a connection which can be accessed via DBI::dbGetQuery, which is not what I'm trying to do. I can't find any resources anywhere on how to do this, though it seems like it ought to be possible.

Is there any way I can create using sparklyr (or another package if necessary) a connection object to a database that can be accessed using DBI::dbGetQuery?