airflow: Volume is missing for sshKeySecret when dag persistence is enabled.

Official Helm Chart version

1.7.0 (latest released)

Apache Airflow version

2.3.4

Kubernetes Version

1.22.6

Helm Chart configuration

# Git sync
dags:
  persistence:
    enabled: true
    size: 1Gi
    storageClassName:
    accessMode: ReadWriteOnce
    existingClaim: airflow-dags
    subPath: ~
  gitSync:
    enabled: true
    repo: "ssh://git@..."
    branch: "main"
    rev: HEAD
    depth: 1
    maxFailures: 0
    subPath: ""
    sshKeySecret: airflow-ssh-secret

Docker Image customisations

No response

What happened

This commit: https://github.com/apache/airflow/commit/3fc895b9dfe8e7b77538bd80754fb17ccf92db49 causes the following error because volumeMount is created but the volume is missing: upgrade.go:369: [debug] warning: Upgrade "airflow" failed: cannot patch "airflow-scheduler" with kind Deployment: Deployment.apps "airflow-scheduler" is invalid: spec.template.spec.containers[2].volumeMounts[0].name: Not found: "dags"

What you think should happen instead

I don’t see any relation between dag persistency and git-sync configuration. The fact is that the same configuration works when credentialsSecret is defined but stopped working with sshKeySecret.

How to reproduce

Follow commit https://github.com/apache/airflow/commit/3fc895b9dfe8e7b77538bd80754fb17ccf92db49

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 7
  • Comments: 19 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Came to report the same bug with the helm chart version 1.8.0, quite a discussion going on here 😅 .

For anyone wishing to get a very quick workaround, add the below override to create the missing volume:

scheduler:
  extraVolumes:
    - name: git-sync-ssh-key
      secret:
        secretName: airflow-ssh-secret

I am using Github Enterprise. Personnaly I am willing to use both git-sync and a persistent volume for redundancy, as relying solely on git-sync would make Github a single point of failure for all my DAGs.

I think a slightly more involved way to go would be to create a git-sync K8s job in CI/CD. That way instead of constantly polling Github, git-sync becomes PR event-driven.

@potiuk, the biggest gap with git-sync is that we effectively have no way to poll less frequently. This isn’t a concern at small scale, but there is a point with many instances using a monorepo where it becomes problematic.

There are probably better solutions to that problem, but polling less frequently when you aren’t on 1 LocalExecutor is asking for heartache eventually. And “polling less frequently” is a natural knob to reach for, unfortunately, if you hit that case or think you might, try and get ahead of the game, etc.

Relying on (presumably) syncing from github also introduces risk in your production system, persistence is at least “local” to some degree. I’m generally pro gitsync without persistence, but there are definitely scenarios where it isn’t enough.