Airflow 1.9.0 is queuing but not launching tasks

2019-01-05 03:25发布

Airflow is randomly not running queued tasks some tasks dont even get queued status. I keep seeing below in the scheduler logs

 [2018-02-28 02:24:58,780] {jobs.py:1077} INFO - No tasks to consider for execution.

I do see tasks in database that either have no status or queued status but they never get started.

The airflow setup is running https://github.com/puckel/docker-airflow on ECS with Redis. There are 4 scheduler threads and 4 Celery worker tasks. For the tasks that are not running are showing in queued state (grey icon) when hovering over the task icon operator is null and task details says:

    All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:- The scheduler is down or under heavy load

Metrics on scheduler do not show heavy load. The dag is very simple with 2 independent tasks only dependent on last run. There are also tasks in the same dag that are stuck with no status (white icon).

Interesting thing to notice is when I restart the scheduler tasks change to running state.

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-01-05 03:55

I'm running a fork of the puckel/docker-airflow repo as well, mostly on Airflow 1.8 for about a year with 10M+ task instances. I think the issue persists in 1.9, but I'm not positive.

For whatever reason, there seems to be a long-standing issue with the Airflow scheduler where performance degrades over time. I've reviewed the scheduler code, but I'm still unclear on what exactly happens differently on a fresh start to kick it back into scheduling normally. One major difference is that scheduled and queued task states are rebuilt.

Scheduler Basics in the Airflow wiki provides a concise reference on how the scheduler works and its various states.

Most people solve the scheduler diminishing throughput problem by restarting the scheduler regularly. I've found success at a 1-hour interval personally, but have seen as frequently as every 5-10 minutes used too. Your task volume, task duration, and parallelism settings are worth considering when experimenting with a restart interval.

For more info see:

This used to be addressed by restarting every X runs using the SCHEDULER_RUNS config setting, although that setting was recently removed from the default systemd scripts.

You might also consider posting to the Airflow dev mailing list. I know this has been discussed there a few times and one of the core contributors may be able to provide additional context.

Related Questions

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-01-05 03:58

Airflow can be a bit tricky to setup.

  • Do you have the airflow scheduler running?
  • Do you have the airflow webserver running?
  • Have you checked that all DAGs you want to run are set to On in the web ui?
  • Do all the DAGs you want to run have a start date which is in the past?
  • Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
  • If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.

I've had for instance a DAG which was wrongly set to depends_on_past: True which forbid the current instance to start correctly.

Also a great resource directly in the docs, which has a few more hints: Why isn't my task getting scheduled?.

查看更多
登录 后发表回答