Which is the recommended tool for scheduling Spark Jobs on a daily/weekly basis. 1) Oozie 2) Luigi 3) Azkaban 4) Chronos 5) Airflow
Thanks in advance.
Which is the recommended tool for scheduling Spark Jobs on a daily/weekly basis. 1) Oozie 2) Luigi 3) Azkaban 4) Chronos 5) Airflow
Thanks in advance.
Updating my previous answer from here: Suggestion for scheduling tool(s) for building hadoop based data pipelines
Philosophy:
Simpler pipelines are better than complex pipelines: Easier to create, easier to understand (especially when you didn’t create) and easier to debug/fix.
When complex actions are needed you want to encapsulate them in a way that either completely succeeds or completely fails.
If you can make it idempotent (running it again creates identical results) then that’s even better.