I have a spark application which should run whenever it receives a kafka message on a topic.
I won't be receiving more than 5-6 messages a day so I don't want to take spark streaming approach. Instead I tried to submit the application using SparkLauncher
but I don't like the approach as I have to set spark and Java classpath programmatically within my code along with all the necessary spark properties like executor cores, executor memory etc.
How do I trigger the spark application to run from spark-submit
but make it wait until it receives a message?
Any pointers are very helpful.
You can use shell script approach with
nohup
command to submit job like this..."
nohup spark-submit shell script <parameters> 2>&1 < /dev/null &
"Whenever, you get messages then you can poll that event and call this shell script.
Below is the code snippet to do this... Further more have a look https://en.wikipedia.org/wiki/Nohup
- Using
RunTime
- Using
ProcessBuilder
- Another way- Third way : jsch
Run a command over SSH with JSch
-
YarnClient
class -fourth wayOne of my favourite book Data algorithms uses this approach