Saving the data from SparkStreaming Workers to Dat

2019-08-02 21:50发布

In SparkStreaming should we off load the saving part to another layer because SparkStreaming context is not available when we use SparkCassandraConnector if our database is cassandra. Moreover, even if we use some other database to save our data then we need to create connection on the worker every time we process a batch of rdds. Reason being connection objects are not serialized.

Is it recommended to create/close connections at workers?
It would make our system tightly coupled with the existing database tomorrow we may change the database

标签： apache-spark spark-streaming datastax datastax-enterprise

1条回答

你好瞎i

2楼-- · 2019-08-02 22:28

To answer your questions:

Yes, it is absolutely fine to create/close connections at workers. But, make sure you don't do it for each and every record. It is recommended to do it at the partition level or at a level where connections are created/closed for a group of records.
You can decouple it by passing a variable and deciding on the type of DB connection at runtime.

Possible duplicate of: Handle database connection inside spark streaming

Read this link, it should clarify some of you questions Design Patterns for using foreachRDD

Hope this help!

0人赞添加讨论(0) 举报

Saving the data from SparkStreaming Workers to Dat

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间