airflow: Volume is missing for sshKeySecret when dag persistence is enabled.
Official Helm Chart version
1.7.0 (latest released)
Apache Airflow version
2.3.4
Kubernetes Version
1.22.6
Helm Chart configuration
# Git sync
dags:
persistence:
enabled: true
size: 1Gi
storageClassName:
accessMode: ReadWriteOnce
existingClaim: airflow-dags
subPath: ~
gitSync:
enabled: true
repo: "ssh://git@..."
branch: "main"
rev: HEAD
depth: 1
maxFailures: 0
subPath: ""
sshKeySecret: airflow-ssh-secret
Docker Image customisations
No response
What happened
This commit: https://github.com/apache/airflow/commit/3fc895b9dfe8e7b77538bd80754fb17ccf92db49 causes the following error because volumeMount is created but the volume is missing:
upgrade.go:369: [debug] warning: Upgrade "airflow" failed: cannot patch "airflow-scheduler" with kind Deployment: Deployment.apps "airflow-scheduler" is invalid: spec.template.spec.containers[2].volumeMounts[0].name: Not found: "dags"
What you think should happen instead
I don’t see any relation between dag persistency and git-sync configuration. The fact is that the same configuration works when credentialsSecret
is defined but stopped working with sshKeySecret
.
How to reproduce
Follow commit https://github.com/apache/airflow/commit/3fc895b9dfe8e7b77538bd80754fb17ccf92db49
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 7
- Comments: 19 (14 by maintainers)
Came to report the same bug with the helm chart version
1.8.0
, quite a discussion going on here 😅 .For anyone wishing to get a very quick workaround, add the below override to create the missing volume:
I am using Github Enterprise. Personnaly I am willing to use both git-sync and a persistent volume for redundancy, as relying solely on git-sync would make Github a single point of failure for all my DAGs.
I think a slightly more involved way to go would be to create a git-sync K8s job in CI/CD. That way instead of constantly polling Github, git-sync becomes PR event-driven.
@potiuk, the biggest gap with git-sync is that we effectively have no way to poll less frequently. This isn’t a concern at small scale, but there is a point with many instances using a monorepo where it becomes problematic.
There are probably better solutions to that problem, but polling less frequently when you aren’t on 1 LocalExecutor is asking for heartache eventually. And “polling less frequently” is a natural knob to reach for, unfortunately, if you hit that case or think you might, try and get ahead of the game, etc.
Relying on (presumably) syncing from github also introduces risk in your production system, persistence is at least “local” to some degree. I’m generally pro gitsync without persistence, but there are definitely scenarios where it isn’t enough.