Failed to extract xcom from airflow pod - Kubernet

2019-08-07 21:33发布

问题:

While running a DAG which runs a jar using a docker image,
xcom_push=True is given which creates another container along with the docker image in a single pod.

DAG :

jar_task = KubernetesPodOperator(
    namespace='test',
    image="path to image",
    image_pull_secrets="secret",
    image_pull_policy="Always",
    node_selectors={"d-type":"na-node-group"},
    cmds=["sh","-c",..~running jar here~..],
    secrets=[secret_file],
    env_vars=environment_vars,
    labels={"k8s-app": "airflow"},
    name="airflow-pod",
    config_file=k8s_config_file,
    resources=pod.Resources(request_cpu=0.2,limit_cpu=0.5,request_memory='512Mi',limit_memory='1536Mi'),
    in_cluster=False,
    task_id="run_jar",
    is_delete_operator_pod=True,
    get_logs=True,
    xcom_push=True,
    dag=dag)

Here are the errors when the JAR is executed successfully..

    [2018-11-27 11:37:21,605] {{logging_mixin.py:95}} INFO - [2018-11-27 11:37:21,605] {{pod_launcher.py:166}} INFO - Running command... cat /airflow/xcom/return.json
    [2018-11-27 11:37:21,605] {{logging_mixin.py:95}} INFO - 
    [2018-11-27 11:37:21,647] {{logging_mixin.py:95}} INFO - [2018-11-27 11:37:21,646] {{pod_launcher.py:173}} INFO - cat: can't open '/airflow/xcom/return.json': No such file or directory
    [2018-11-27 11:37:21,647] {{logging_mixin.py:95}} INFO - 
    [2018-11-27 11:37:21,647] {{logging_mixin.py:95}} INFO - [2018-11-27 11:37:21,647] {{pod_launcher.py:166}} INFO - Running command... kill -s SIGINT 1
    [2018-11-27 11:37:21,647] {{logging_mixin.py:95}} INFO - 
    [2018-11-27 11:37:21,702] {{models.py:1760}} ERROR - Pod Launching failed: Failed to extract xcom from pod: airflow-pod-hippogriff-a4628b12
    Traceback (most recent call last):
      File "/usr/local/airflow/operators/kubernetes_pod_operator.py", line 126, in execute
        get_logs=self.get_logs)
      File "/usr/local/airflow/operators/pod_launcher.py", line 90, in run_pod
        return self._monitor_pod(pod, get_logs)
      File "/usr/local/airflow/operators/pod_launcher.py", line 110, in _monitor_pod
        result = self._extract_xcom(pod)
      File "/usr/local/airflow/operators/pod_launcher.py", line 161, in _extract_xcom
        raise AirflowException('Failed to extract xcom from pod: {}'.format(pod.name))
    airflow.exceptions.AirflowException: Failed to extract xcom from pod: airflow-pod-hippogriff-a4628b12

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
        result = task_copy.execute(context=context)
      File "/usr/local/airflow/operators/kubernetes_pod_operator.py", line 138, in execute
        raise AirflowException('Pod Launching failed: {error}'.format(error=ex))
    airflow.exceptions.AirflowException: Pod Launching failed: Failed to extract xcom from pod: airflow-pod-hippogriff-a4628b12
    [2018-11-27 11:37:21,704] {{models.py:1789}} INFO - All retries failed; marking task as FAILED

回答1:

If xcom_push is True then KubernetesPodOperator creates one more sidecar container (airflow-xcom-sidecar) in Pod along with the base container(actual worker container). This sidecar container reads data from /airflow/xcom/return.json and returns as xcom value. So in your base container you need to write the data you want to return in /airflow/xcom/return.json file.



回答2:

This happened because the result of the task execution is not being pushed to the xcom in the expected path required by the KubernetesPodOperator plugin. Take a look at the following unit test from the Airflow repository to check how it should be implemented (source code snippet included below for your convenience, followed by the link to the repository):

    def test_xcom_push(self):
        return_value = '{"foo": "bar"\n, "buzz": 2}'
        k = KubernetesPodOperator(
            namespace='default',
            image="ubuntu:16.04",
            cmds=["bash", "-cx"],
            arguments=['echo \'{}\' > /airflow/xcom/return.json'.format(return_value)],
            labels={"foo": "bar"},
            name="test",
            task_id="task",
            xcom_push=True
        )
        self.assertEqual(k.execute(None), json.loads(return_value))

https://github.com/apache/incubator-airflow/blob/36f3bfb0619cc78698280f6ec3bc985f84e58343/tests/contrib/minikube/test_kubernetes_pod_operator.py#L321

edit: it is worth mentioning that the result pushed to the xcom must be a json.



回答3:

I want to point out the error I faced regarding xcom and KubernetesPodOperator although it was not the same cause as the OP. Just in case anyone stumbles on this question since this is the only one regarding KPO and XCom.

I am using Google Cloud Platform (GCP) Cloud Composer, it uses a slightly older than latest Airflow version, hence when i referred to official GitHub, it mentions to use do_xcom_push whereas the old Airflow uses the arg xcom_push instead!