The example mentioned in http://spark.apache.org/docs/latest/streaming-programming-guide.html Lets me receive data packets in a TCP stream and listening on port 9999
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
// Create a DStream that will connect to hostname:port, like localhost:9999
val lines = ssc.socketTextStream("localhost", 9999)
// Split each line into words
val words = lines.flatMap(_.split(" "))
import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
I am able to send data over TCP by creating a data server by using in my Linux system
$ nc -lk 9999
Question
I need to receive stream from an android phone streaming using UDP and the Scala/Spark
val lines = ssc.socketTextStream("localhost", 9999)
receives ONLY in TCP streams.
How can I receive UDP streams in a similar simple manner using Scala+Spark and create Spark DStream.
There isn't something built in, but it's not too much work to get it done youself. Here is a simple solution I made based on a custom
UdpSocketInputDStream[T]
:In order to get
StreamingContext
to add a method on itself, we enrich it with an implicit class:And here is how you call it all:
Most of this code is taken from the
SocketInputDStream[T]
provided by Spark, I simply re-used it. I also took the code for theNextIterator
which is used bybytesToLines
, all it does is consume the line from the packet and transform it to aString
. If you have more complex logic, you can provide it by passingconverter: InputStream => Iterator[T]
your own implementation.Testing it with simple UDP packet:
Yields:
Of course, this has to be further tested. I also has a hidden assumption that each
buffer
created from theDatagramPacket
is 2048 bytes, which is perhaps something you'll want to change.