I'm using Airflow version 1.9 and there is a bug in their software that you can read about here on my previous Stackoverflow post, as well as here on another one of my Stackoverflow posts, and here on Airflow's Github where the bug is reported and discussed.
Long story short there are a few locations in Airflow's code where it needs to get the IP address of the server. They accomplish this by running this command:
socket.getfqdn()
The problem is that on Amazon EC2-Instances (Amazon Linux 1) this command doesn't return the IP address rather it returns the hostname like this:
IP-1-2-3-4
Where as it needs the IP address like this:
1.2.3.4
To get this IP value I found from here that I can use this command:
socket.gethostbyname(socket.gethostname())
I've tested the command out in a Python shell and it returns the proper value. So I ran a search on the Airflow package to find all occurrences of socket.getfqdn()
and this is what I got back:
[airflow@ip-1-2-3-4 site-packages]$ cd airflow/
[airflow@ip-1-2-3-4 airflow]$ grep -r "fqdn" .
./security/utils.py: fqdn = host
./security/utils.py: if not fqdn or fqdn == '0.0.0.0':
./security/utils.py: fqdn = get_localhost_name()
./security/utils.py: return '%s/%s@%s' % (components[0], fqdn.lower(), components[2])
./security/utils.py: return socket.getfqdn()
./security/utils.py:def get_fqdn(hostname_or_ip=None):
./security/utils.py: fqdn = socket.gethostbyaddr(hostname_or_ip)[0]
./security/utils.py: fqdn = get_localhost_name()
./security/utils.py: fqdn = hostname_or_ip
./security/utils.py: if fqdn == 'localhost':
./security/utils.py: fqdn = get_localhost_name()
./security/utils.py: return fqdn
Binary file ./security/__pycache__/utils.cpython-36.pyc matches
Binary file ./security/__pycache__/kerberos.cpython-36.pyc matches
./security/kerberos.py: principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.getfqdn())
./security/kerberos.py: principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.getfqdn())
Binary file ./contrib/auth/backends/__pycache__/kerberos_auth.cpython-36.pyc matches
./contrib/auth/backends/kerberos_auth.py: service_principal = "%s/%s" % (configuration.get('kerberos', 'principal'), utils.get_fqdn())
./www/views.py: 'airflow/circles.html', hostname=socket.getfqdn()), 404
./www/views.py: hostname=socket.getfqdn(),
Binary file ./www/__pycache__/app.cpython-36.pyc matches
Binary file ./www/__pycache__/views.cpython-36.pyc matches
./www/app.py: 'hostname': socket.getfqdn(),
Binary file ./__pycache__/jobs.cpython-36.pyc matches
Binary file ./__pycache__/models.cpython-36.pyc matches
./bin/cli.py: hostname = socket.getfqdn()
Binary file ./bin/__pycache__/cli.cpython-36.pyc matches
./config_templates/default_airflow.cfg:# gets augmented with fqdn
./jobs.py: self.hostname = socket.getfqdn()
./jobs.py: fqdn = socket.getfqdn()
./jobs.py: same_hostname = fqdn == ti.hostname
./jobs.py: "{fqdn}".format(**locals()))
Binary file ./api/auth/backend/__pycache__/kerberos_auth.cpython-36.pyc matches
./api/auth/backend/kerberos_auth.py:from socket import getfqdn
./api/auth/backend/kerberos_auth.py: hostname = getfqdn()
./models.py: self.hostname = socket.getfqdn()
./models.py: self.hostname = socket.getfqdn()
I'm unsure if I should just replace all occurrences of the socket.getfqdn()
command with socket.gethostbyname(socket.gethostname())
or not. For one this would be cumbersome to maintain since I would no longer be using the Airflow package I installed from Pip. I tried upgrading to Airflow version 1.10 but it was very buggy and I couldn't get it up and running. So it seems like for now I'm stuck with Airflow version 1.9 but I need to correct this Airflow bug because it's causing my tasks to sporadically fail.