Why chronos is called as distributed and fault-tolerant scheduler? As per my understanding there is only one scheduler instance running that manages job schedules.
As per Chronos doc, internally, the Chronos scheduler main loop is quite simple.
The pattern is as follows:
Chronos reads all job state from the state store (ZooKeeper)
Jobs are registered within the scheduler and loaded into the job graph for tracking dependencies.
- Jobs are separated into a list of those which should be run at the current time (based on the clock of the host machine), and those which should not.
- Jobs in the list of jobs to run are queued, and will be launched as soon as a sufficient offer becomes available.
- Chronos will sleep until the next job is scheduled to run, and begin again from step 1.
Experts please opine?