I'm reading through this blog post:
http://blog.jaceklaskowski.pl/2015/07/20/real-time-data-processing-using-apache-kafka-and-spark-streaming.html
It discusses about using Spark Streaming and Apache Kafka to do some near real time processing. I completely understand the article. It does show how I could use Spark Streaming to read messages from a Topic. I would like to know if there is a Spark Streaming API that I can use to write messages into Kakfa topic?
My use case is pretty simple. I have a set of data that I can read from a given source at a constant interval (say every second). I do this using reactive streams. I would like to do some analytics on this data using Spark. I want to have fault-tolerance, so Kafka comes into play. So what I would essentially do is the following (Please correct me if I was wrong):
- Using reactive streams get the data from external source at constant intervals
- Pipe the result into Kafka topic
- Using Spark Streaming, create the streaming context for the consumer
- Perform analytics on the consumed data
One another question though, is the Streaming API in Spark an implementation of the reactive streams specification? Does it have back pressure handling (Spark Streaming v1.5)?