airflow: Duplicate log lines in CloudWatch after upgrade to 2.4.2
Apache Airflow version
2.4.2
What happened
We upgraded airflow from 2.4.1 to 2.4.2 and immediately notice that every task log line is duplicated into CloudWatch. Comparing logs from tasks run before upgrade and after upgrade indicates that the issue is not in how the logs are displayed in Airflow, but rather that it now produces two log lines instead of one.
When observing both the CloudWatch log streams and the Airflow UI, we can see duplicate log lines for ~all~ most log entries post upgrade, whilst seeing single log lines in tasks before upgrade.
This happens both for tasks ran in a remote EcsRunTaskOperator
’s as well as in regular PythonOperator
’s.
What you think should happen instead
A single non-duplicate log line should be produced into CloudWatch.
How to reproduce
From my understanding now, any setup on 2.4.2 that uses CloudWatch remote logging will produce duplicate log lines. (But I have not been able to confirm other setups)
Operating System
Docker: apache/airflow:2.4.2-python3.9
- Running on AWS ECS Fargate
Versions of Apache Airflow Providers
apache-airflow[celery,postgres,apache.hive,jdbc,mysql,ssh,amazon,google,google_auth]==2.4.2
apache-airflow-providers-amazon==6.0.0
Deployment
Other Docker-based deployment
Deployment details
We are running a docker inside Fargate ECS on AWS.
The following environment variables + config in CloudFormation control remote logging:
- Name: AIRFLOW__LOGGING__REMOTE_LOGGING
Value: True
- Name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
Value: !Sub "cloudwatch://${TasksLogGroup.Arn}"
Anything else
We did not change any other configuration during the upgrade, simply bumped the requirements for provider list + docker image from 2.4.1 to 2.4.2.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (11 by maintainers)
Fix incoming, and we’re preparing the RC for 2.4.3 this week, so will be available soon
@potiuk I’ve made the changes as pointed by you and it worked! 🥳
config/log_config.py
airflow.cfg:
Thanks!
We were on 2.4.1 w/ 6.0.0 and noticed it when upgrading to 2.4.2. (So I’d say it’s unlikely)
LEt’s see what @ashb has to say 😃 . I am not sure I know the exact reason why propagate was set to True for “airflow.tasks” (I know why it was for “airflow.processor” )