airflow: import error

Apache Airflow version: 2.0.2

Kubernetes version:

  • Client Version: v1.17.4
  • Server Version: v1.18.14

Environment:

  • Cloud provider or hardware configuration: Azure
  • OS: “Debian GNU/Linux 10 (buster)”
  • Kernel: Linux airflow2-webserver-96bfc89f7-qwk26 5.4.0-1043-azure #45~18.04.1-Ubuntu SMP Sat Mar 20 16:16:05 UTC 2021 x86_64 GNU/Linux

What happened:

My folder structure is the following

/opt/airflow
├── dags
│   ├── __init__.py
│   ├── example_k8s
│   │   └── test.py
│   └── utils
│       ├── defaults.py
│       ├── __init__.py
└── ...

In test.py I am importing from defaults.py as follows

from utils.defaults import DEFAULT_ARGS, DEFAULT_NAMESPACE

This raises an error in the UI

Broken DAG: [/opt/airflow/dags/example_k8s/test.py] Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/airflow/dags/example_k8s/test.py", line 200, in <module>
    from utils.defaults import DEFAULT_ARGS, DEFAULT_NAMESPACE
ModuleNotFoundError: No module named 'utils'

Not sure why as the folder /opt/airflow/dags should be already included in the path (at least in the worker pod) as when I print sys.path from a PythonOperator I get

['/home/airflow/.local/bin', '/usr/local/lib/python36.zip', '/usr/local/lib/python3.6', '/usr/local/lib/python3.6/lib-dynload', '/home/airflow/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6/site-packages', '/opt/airflow/dags', '/opt/airflow/config', '/opt/airflow/plugins']

and executing the command

python /opt/airflow/dags/example_k8s/test.py

in the webserver does not raise any error.

If I explicitly add /opt/airflow/dags to the system path in test.py

sys.path.insert(0, '/opt/airflow/dags/')

the error in the UI is no longer reported and the DAG appears in the UI.

What you expected to happen:

The import to work even without explicitly adding the path to the system

How to reproduce it: See above

Anything else we need to know:

Airflow has been installed using Helm

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 20 (5 by maintainers)

Most upvoted comments

@zorzigio I’m use airflow helm chart and image info is below

image:
    repository: apache/airflow
    tag: 2.0.2-python3.8

I add just PYTHONPATH to config env

#values.yaml
airflow:
  config:
    PYTHONPATH: /opt/airflow/dags/repo # repo is created because of gitsync 

then import error is gone 👍👍 my dags folder tree

dags
  ㄴ __init__.py
  ㄴ example.py
  ㄴ utils
    ㄴ decorator.py
  ㄴ category_1
    ㄴ file_1.py
    ㄴ file_2.py

now I can from dags import some_function in file_1 and file_2 Also I can from dags.utils.decorator import some in example.py 😁

Our team had same issue with import. Finally we decided to move the utils stuff to plugins out of dags folder and it works like a charm

You will see the structure is similar to:

/opt/airflow
├── dags
│   ├── __init__.py
│   ├── example_k8s
│   │   └── test.py
├── plugins
│   └── utils
│       ├── defaults.py
│       ├── __init__.py
└── 

For everyone who deploy Airflow k8s cluster and using gitsync, there are some extra steps requires (We haven’t tested it yet, but we are going to do that and update the result here.)

sure

A workaraound is setting PYTHONPATH in values.yaml directly as env variable

env:
  - name: PYTHONPATH
    value: /opt/airflow/dags

Note: I upgraded Airflow to version 2.1.0

Note 2: there is a note in the Airflow values.yaml just above the extraEnv key which states

# TODO: difference from `env`? This is a templated string. Probably should template `env` and remove this.
extraEnv: ~

so probably this solution will need to be modified for future Airflow versions to

env: | # <-- note the pipe here
  - name: PYTHONPATH
    value: /opt/airflow/dags