Consume Spark SQL dataset as RDD based job

2019-08-02 17:01发布

站内文章 / Java

9 0

做自己的国王

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Spark dataframe have toRDD() method but I don't understand how It's useful. Can we start a SQL streaming job by processing converted source dataset to RDD instead of making and starting DataStreamWriter?

回答1:

Dataset provides uniform API for both batch and streaming processing and not every method is applicable to streaming Datasets. If you search carefully, you'll find other methods which cannot be used with streaming Datasets (for example describe).