How to run Airflow on Windows

2019-01-26 05:08发布

问题:

The usual instructions for running Airflow do not apply on a Windows environment:

# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=~/airflow

# install from pypi using pip
pip install airflow

# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080

The Airflow utility is not available in the command line and I can't find it elsewhere to be manually added. How can Airflow run on Windows?

回答1:

You can activate bash in windows and follow the tutorial as is. I was able to get up and running successfully following above.

Once you are done installing, edit airflow.cfg to point all your configurations to somewhere in your windows system rather than lxss (ubuntu) since there are bugs around ubuntu not showing files written by windows system.



回答2:

Instead of installing Airflow via pip, download the zip on the Airflow project's GitHub, unzip it and in its folder, run python setup.py install on the command line. ERROR - 'module' object has no attribute 'SIGALRM' errors will happen, but so far this had no impact on Airflow's functions.

Using this method, the airflow util will not be available as a command. As a workaround, use the [current folder]\build\scripts-2.7\airflow file, which is the python script for the airflow util.

Another solution is to append to the System PATH variable a link to a batch file that runs airflow (airflow.bat):

python C:\path\to\airflow %*

From this point, the tutorial may be followed normally:

airflow init
airflow webserver -p 8080

I have not tested how well or if Airflow's DAGs run on Windows.



回答3:

Unfortunately, the answer to this seems to be "No" as of Dec 2015 - see https://github.com/airbnb/airflow/issues/709. This is because of the move to gunicorn. gunicorn may get windows support in R18.



回答4:

I went through a few iterations of this problem and documented them as I went along. The three things I tried were:

  1. Install Airflow directly into Windows 10 - This attempt failed.
  2. Install Airflow into Windows 10 WSL with Ubuntu - This worked great. Note that WSL is Windows Subsystem for Linux, which you can get for free in the Windows store.
  3. Install Airflow into Windows 10 via Docker + Centos - This worked great as well.

Note that if you want to get it running as a Linux service, it is not possible for option number 2. It is possible for option number 3, but I didn't do it as it requires activating privileged containers in docker (which I wan't aware of when I started). Also, running a service in Docker is kind of against paradigm as each container should be a single process/unit of responsibility anyway.

If you're gong for option 2, the basic steps are:

  • Get WSL Ubuntu installed and opened up.
  • Verify it comes with python 3.6.5 or so ("python3 -version").
  • Assuming it still does, add these packages so that installing PIP will wor.
    • sudo apt-get install software-properties-common
    • sudo apt-add-repository universe
    • sudo apt-get update
  • Install pip with:
    • sudo apt-get install python-pip
  • Run the following 2 commands to install airflow:
    • export SLUGIFY_USES_TEXT_UNIDECODE=yes
    • pip install apache-airflow
  • Open a new terminal (I was surprised, but this seemed to be required).
  • Init the airflow DB:
    • airflow initdb

After this, you should be good to go! The blog has more detail on many of these steps and rough timelines for how long setting up WSL takes, etc - so if you have a hard time dive in there some more.



回答5:

You can do it using Cygwin. Cygwin is a command line shell that runs on Windows and emulates Linux. So you'll be able to run the commands,

# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=~/airflow

# install from pypi using pip
pip install apache-airflow

# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080

Note 1: If you're running Cygwin on your company supplied computer you may need to run the Cygwin application as an administrator. You can do so with the following tutorial from Microsoft.

Note 2: If like me you are behind a proxy (at your work or whatever proxy you're behind) you'll need to set two enviornment variables for pip to work on the command line; in this case Cygwin. You can follow this StackOverflow answer for more details. So I set the following two environment variables on my Windows machine,

// Note this first entry has an S in HTTPS and the other entry is just regular HTTP. Don't forget that distinction in the key name and in the url of the value.
HTTPS_PROXY=https://myUsernameGoesHere:myPasswordGoesHere@yourProxyHostNameGoesHere:yourProxyPortNumberGoesHere

HTTP_PROXY=http://myUsernameGoesHere:myPasswordGoesHere@yourProxyHostNameGoesHere:yourProxyPortNumberGoesHere

No Longer Works: Apparently all of the above work was in vain because Airflow won't work on Windows. Please see this StackOverflow post. The above steps will allow you to use Pip though.

Alternatively, and I know this may or may not be seen as being run on Windows, you could install a virtual machine client such as Oracle's Virtualbox or VMware's Workstation and then setup whatever Linux version you want such as Ubuntu Desktop and then you can run Linux normally. If you need more detailed steps to do this you can follow this AskUbuntu from the Stack Exchange community answer here.

Alternatively (2), you could create an AWS account, then setup a simple ec2-instance running Linux, then ssh into that ec2-instance, and then run all your commands to your hearts content. AWS offers a free tier so you should be able to do it for free. Plus, AWS is very well documented so it shouldn't be too hard to get a simple Linux server up and running; I estimate a beginner could be done with it in about an hour.