DSTREAM的笛卡尔(Cartesian of DStream)

2019-10-29 21:34发布

我用星火笛卡尔函数生成的值列表N对。

我然后映射对这些值以产生每个用户之间的距离的度量:

val cartesianUsers: org.apache.spark.rdd.RDD[(distance.classes.User, distance.classes.User)] = users.cartesian(users)
cartesianUsers.map(m => manDistance(m._1, m._2))

这按预期工作。

用放电流媒体库中创建一个DSTREAM然后映射了它:

val customReceiverStream: ReceiverInputDStream[String] = ssc.receiverStream....
customReceiverStream.foreachRDD(m => {
  println("size is " + m)
})

我可以用笛卡尔函数中customReceiverStream.foreachRDD但根据DOC http://spark.apache.org/docs/1.2.0/streaming-programming-guide.htm这并不是它的用途:

foreachRDD(FUNC),该应用一个函数,最通用的输出操作者func, to each RDD generated from the stream. This function should push the data in each RDD to a external system, like saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs. func, to each RDD generated from the stream. This function should push the data in each RDD to a external system, like saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

如何计算DSTREAM的笛卡尔? 也许我误解了使用DStreams的?

Answer 1:

我不知道变换方法的:

cartesianUsers.transform(car => car.cartesian(car))

尼斯的谈话也提到了在约17:00变换函数https://www.youtube.com/watch?v=g171ndOHgJ0



文章来源: Cartesian of DStream