airflow: Duplicate log lines in CloudWatch after upgrade to 2.4.2

Apache Airflow version

2.4.2

What happened

We upgraded airflow from 2.4.1 to 2.4.2 and immediately notice that every task log line is duplicated into CloudWatch. Comparing logs from tasks run before upgrade and after upgrade indicates that the issue is not in how the logs are displayed in Airflow, but rather that it now produces two log lines instead of one.

When observing both the CloudWatch log streams and the Airflow UI, we can see duplicate log lines for ~all~ most log entries post upgrade, whilst seeing single log lines in tasks before upgrade.

This happens both for tasks ran in a remote EcsRunTaskOperator’s as well as in regular PythonOperator’s.

What you think should happen instead

A single non-duplicate log line should be produced into CloudWatch.

How to reproduce

From my understanding now, any setup on 2.4.2 that uses CloudWatch remote logging will produce duplicate log lines. (But I have not been able to confirm other setups)

Operating System

Docker: apache/airflow:2.4.2-python3.9 - Running on AWS ECS Fargate

Versions of Apache Airflow Providers

apache-airflow[celery,postgres,apache.hive,jdbc,mysql,ssh,amazon,google,google_auth]==2.4.2
apache-airflow-providers-amazon==6.0.0

Deployment

Other Docker-based deployment

Deployment details

We are running a docker inside Fargate ECS on AWS.

The following environment variables + config in CloudFormation control remote logging:

            - Name: AIRFLOW__LOGGING__REMOTE_LOGGING
              Value: True
            - Name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
              Value: !Sub "cloudwatch://${TasksLogGroup.Arn}"

Anything else

We did not change any other configuration during the upgrade, simply bumped the requirements for provider list + docker image from 2.4.1 to 2.4.2.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Fix incoming, and we’re preparing the RC for 2.4.3 this week, so will be available soon

@potiuk I’ve made the changes as pointed by you and it worked! 🥳

config/log_config.py

from copy import deepcopy

from airflow.config_templates.airflow_local_settings import \
    DEFAULT_LOGGING_CONFIG

LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
LOGGING_CONFIG["loggers"]["airflow.task"]["propagate"] = False

airflow.cfg:

[logging]
logging_config_class = log_config.LOGGING_CONFIG

Thanks!

Was the cause upgrading to 2.4.2, or is it possible this is related to the 6.0.0 release of Amazon provider?

We were on 2.4.1 w/ 6.0.0 and noticed it when upgrading to 2.4.2. (So I’d say it’s unlikely)

LEt’s see what @ashb has to say 😃 . I am not sure I know the exact reason why propagate was set to True for “airflow.tasks” (I know why it was for “airflow.processor” )