airflow: Log files are still being cached causing ever-growing memory usage when scheduler is running
Apache Airflow version
2.4.1
What happened
My Airflow scheduler memory usage started to grow after I turned on the dag_processor_manager
log by doing
export CONFIG_PROCESSOR_MANAGER_LOGGER=True
see the red arrow below
By looking closely at the memory usage as mentioned in https://github.com/apache/airflow/issues/16737#issuecomment-917677177, I discovered that it was the cache memory that’s keep growing:
Then I turned off the dag_processor_manager
log, memory usage returned to normal (not growing anymore, steady at ~400 MB)
This issue is similar to #14924 and #16737. This time the culprit is the rotating logs under ~/logs/dag_processor_manager/dag_processor_manager.log*
.
What you think should happen instead
Cache memory shouldn’t keep growing like this
How to reproduce
Turn on the dag_processor_manager
log by doing
export CONFIG_PROCESSOR_MANAGER_LOGGER=True
in the entrypoint.sh
and monitor the scheduler memory usage
Operating System
Debian GNU/Linux 10 (buster)
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
k8s
Anything else
I’m not sure why the previous fix https://github.com/apache/airflow/pull/18054 has stopped working 🤔
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (25 by maintainers)
Commits related to this issue
- Make RotatingFilehandler used in DagProcessor non-caching The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as the FileHa... — committed to potiuk/airflow by potiuk 2 years ago
- Make RotatingFilehandler used in DagProcessor non-caching (#27223) The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as... — committed to apache/airflow by potiuk 2 years ago
- Make RotatingFilehandler used in DagProcessor non-caching (#27223) The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as t... — committed to apache/airflow by potiuk 2 years ago
- Make RotatingFilehandler used in DagProcessor non-caching (#27223) The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as t... — committed to apache/airflow by potiuk 2 years ago
worse than a red herring, this is a mirage 😆 can’t change the kernel’s behavior, i’ll just change meself:
this makes the cache memory usage cap at 40~50Mb
It’s always it depends on configuration and monitoring. I personally have this issue might be in Airflow 2.1.x and I do not know is it actually related to Airflow itself or some other stuff. Work with EFS definitely take more effort rather than GitSync.
Just for someone who might found this thread in the future with EFS performance degradation might help:
Disable save python bytecodes inside of NFS (AWS EFS) mount
PYTHONDONTWRITEBYTECODE=x
PYTHONPYCACHEPREFIX
for example to/tmp/pycaches
Throughput in mode Bursting in first looks like miracle but when all Bursting Capacity go to zero it could turn into your life into the hell. Each newly created EFS share has about 2.1 TB BurstingCreditBalance.
What could be done here: