Setting up S3 logging in Airflow

2019-07-18 03:26发布

问题:

This is driving me nuts.

I'm setting up airflow in a cloud environment. I have one server running the scheduler and the webserver and one server as a celery worker, and I'm using airflow 1.8.0.

Running jobs works fine. What refuses to work is logging.

I've set up the correct path in airflow.cfg on both servers:

remote_base_log_folder = s3://my-bucket/airflow_logs/

remote_log_conn_id = s3_logging_conn

I've set up s3_logging_conn in the airflow UI, with the access key and the secret key as described here.

I checked the connection using

s3 = airflow.hooks.S3Hook('s3_logging_conn')

s3.load_string('test','test',bucket_name='my-bucket')

This works on both servers. So the connection is properly set up. Yet all I get whenever I run a task is

*** Log file isn't local.

*** Fetching here: http://*******

*** Failed to fetch log file from worker.

*** Reading remote logs...

Could not read logs from s3://my-bucket/airflow_logs/my-dag/my-task/2018-02-15T21:46:47.577537

I tried manually uploading the log following the expected conventions and the webserver still can't pick it up - so the problem is on both ends. I'm at a loss at what to do, everything I've read so far tells me this should be working. I'm close to just installing the 1.9.0 which I hear changes logging and see if I'm more lucky.

UPDATE: I made a clean install of Airflow 1.9 and followed the specific instructions here.

Webserver won't even start now with the following error:

airflow.exceptions.AirflowConfigException: section/key [core/remote_logging] not found in config

There is an explicit reference to this section in this config template.

So I tried removing it and just loading the S3 handler without checking first and I got the following error message instead:

Unable to load the config, contains a configuration error.

Traceback (most recent call last):

File "/usr/lib64/python3.6/logging/config.py", line 384, in resolve:

self.importer(used)

ModuleNotFoundError: No module named

'airflow.utils.log.logging_mixin.RedirectStdHandler';

'airflow.utils.log.logging_mixin' is not a package

I get the feeling that this shouldn't be this hard.

Any help would be much appreciated, cheers

回答1:

Solved:

  1. upgraded to 1.9
  2. ran the steps described in this comment
  3. added

    [core]

    remote_logging = True

    to airflow.cfg

  4. ran

    pip install --upgrade airflow[log]

Everything's working fine now.