How to share data from Spark RDD between two appli

What is the best way to share spark RDD data between two spark jobs.

I have a case where job 1: Spark Sliding window Streaming App, will be consuming data at regular intervals and creating RDD. This we do not want to persist to storage.

Job 2: Query job that will access the same RDD created in job 1 and generate reports.

I have seen few queries where they were suggesting SPARK Job Server, but as it is a open source not sure if it a possible solution, but any pointers will be of great help.

thankyou !

标签： apache-spark rdd sharing

3条回答

倾城　Initia

2楼-- · 2019-02-18 18:58

According to the official document describes：

Note that none of the modes currently provide memory sharing across applications. If you would like to share data this way, we recommend running a single server application that can serve multiple requests by querying the same RDDs. http://spark.apache.org/docs/latest/job-scheduling.html

0人赞添加讨论(0) 举报

戒情不戒烟

3楼-- · 2019-02-18 19:09

You can share RDDs across different applications using Apache Ignite. Apache ignite provides an abstraction to share the RDDs through which applications can access the RDDs corresponding to different applications. In addition Ignite has the support for SQL indexes, where as native Spark doesn't. Please refer https://ignite.apache.org/features/igniterdd.html for more details.

0人赞添加讨论(0) 举报

神经病院院长

4楼-- · 2019-02-18 19:17

The short answer is you can't share RDD's between jobs. The only way you can share data is to write that data to HDFS and then pull it within the other job. If speed is an issue and you want to maintain a constant stream of data you can use HBase which will allow for very fast access and processing from the second job.

To get a better idea you should look here:

Serializing RDD

0人赞添加讨论(0) 举报

How to share data from Spark RDD between two appli

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间