Share SparkContext between Java and R Apps under t

2019-02-15 01:47发布

问题:

So here is the setup.

Currently I have two Spark Applications initialized. I need to pass data between them (preferably through shared sparkcontext/sqlcontext so I can just query a temp table). I currently use Parquet Files to dataframe transfer, but is it possible any other way?

MasterURL points to the same SparkMaster

Start Spark via Terminal:

/opt/spark/sbin/start-master.sh; 
/opt/spark/sbin/start-slave.sh spark://`hostname`:7077

Java App Setup:

JavaSparkContext context = new JavaSparkContext(conf);
//conf = setMaster(MasterURL), 6G memory, and 4 cores.
SQLContext sqlContext = new SQLContext(parentContext.sc());

Then I register an existing frame later on

//existing dataframe to temptable
df.registerTempTable("table");

and

SparkR

sc <- sparkR.init(master='MasterURL', sparkEnvir=list(spark.executor.memory='6G', spark.cores.max='4')
sqlContext <- sparkRSQL.init(sc)

# attempt to get temptable
df <- sql(sqlContext, "SELECT * FROM table"); # throws the error

回答1:

As far as I know it is not possible given your current configuration. Tables created using registerTempTable are bound to the specific SQLContext which has been used to create corresponding DataFrame. Even if your Java and SparkR applications use the same master their drivers run on separate JVMs and cannot share single SQLContext.

There are tools, like Apache Zeppelin, which take a different approach with a single SQLContext (and SparkContext) which is exposed to individual backends. This way you can register table using for example Scala and read it from Python. There is a fork of Zeppelin which provides some support for SparkR and R. You can check how it starts and interacts R backend.