I have a Spark job which reads an HBase table, some aggregations and store data to mongoDB. Currently this job is running manually using the spark-submit script. I want to schedule it to run for a fixed interval.
How can I achieve this using java.
Any library? Or Can I do this with Thread in java?
Any suggestions appreciated!
If you want to still use
spark-submit
I would rather prefer crontab or something similar and run bash script for example.But if you need to run "spark-submit" from java you can take a look to Package org.apache.spark.launcher. With this approach you can start application programatically with
SparkLauncher
.But your question was about some scheduling library. You can use a simple
Timer
withDate
provided in java util (java.util.TimerTask
), but I would prefer to use Quartz Job Scheduling Library - it is really popular (As I know spring uses Quartz Scheduler too).Just add maven dependency
create spark - Quartz job
now create a trigger and schedule it
Unlikely - If you have spring boot application you can use scheduling for running some methods very easy - just
@EnableScheduling
in configuration and something like this: