Is is possible to get multiple DStream out of a single DStream in spark.
My use case is follows: I am getting Stream of log data from HDFS file.
The log line contains an id (id=xyz).
I need to process log line differently based on the id.
So I was trying to different Dstream for each id from input Dstream.
I couldnt find anything related in documentation.
Does anyone know how this can be achieved in Spark or point to any link for this.
Thanks
You cannot Split multiple DStreams from Single DStreams.
The best you can do is: -
- Modify your source system to have different streams for different ID's and then you can have different jobs to process different Streams
- In case your source cannot change and provide you stream which is mix of ID, then you need to write custom logic to identify the ID and then perform the appropriate operation.
I would always prefer #1 as that is cleaner solution but there are exceptions for which #2 needs to be implemented.