airflow: triggerer block_watchdog is crying "wolf!"
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
The triggerer emits “Triggerer’s async thread was blocked for x.xx seconds” messages even on a cluster where none of the DAGs have been updated to use deferred mode for operators, so there is no load on the triggerer.
I have a detailed analysis of why.
In the “Anything Else” section, I am including a series of logs showing the problem, under differing conditions. But they all share the same condition that there is no load on the triggerer.
The issue is the triggerer’s block_watchdog
async function, which hardcodes a timeout of 0.2 seconds for consecutive runs of the async function. This especially does not play well if you set in kubernetes spec.template.spec.containers[].resources.limits.cpu
. For Google Composer, all pods on GKE Autopilot must have a cpu limit set, but it’s an issue for any k8s deployment that sets a low cpu limit.
The k8s resources.limits.cpu
feature makes use of the Linux kernel’s cgroups cpu throttling feature. There is a cpu period
, typically 100ms, and if all the threads in a container consume more than their allotted cpu time, the container is not scheduled onto a cpu for the rest of the period. So, if you have a cpu limit of 0.5, and over the course of 25ms, 50ms of cpu time is used by the threads in the container, then the container will not run for the next 75ms. This introduces latency into the threads.
An article about it: https://medium.com/@ramandumcs/cpu-throttling-unbundled-eae883e7e494
This is not the only problem. If multiple threads share one cpu, then the context-switching to the other threads also introduces latency into the threads. For the triggerer container, we have these threads of execution:
- The thread running the async loop.
- The main thread, running the
_run_trigger_loop
function. - The kubernetes configured livenessProbe for the container.
- gunicorn master and worker processes.
A hardcoded 200ms is a “real-time” constraint. In order to satisfy such a real-time constraint, you would actually need multiple physical cpu-threads, so that the other concurrent threads do not introduce latencies, and you’d need no cpu throttling enabled.
But we’re doing batch jobs, not real-time stock trading or something.
What you think should happen instead
Since Airflow is a batch job processing system, we should not have expensive real-time constraints that require you to dedicate multiple CPUs to the triggerer. Instead:
- Make the block_watchdog timeout configurable in airflow.cfg
- Have a default on the order of seconds, not hundreds of milliseconds.
- In the Helm chart, add comments about how the triggerer cpu limit affects triggerer block_watchdog latency.
How to reproduce
Have the helm chart values set a triggerer limit of 0.5 or 1.0 cpu, and let it run for a day. You should see similar log messages as I’m including.
Operating System
NAME=“Ubuntu” VERSION=“20.04.6 LTS (Focal Fossa)” ID=ubuntu ID_LIKE=debian PRETTY_NAME=“Ubuntu 20.04.6 LTS” VERSION_ID=“20.04” HOME_URL=“https://www.ubuntu.com/” SUPPORT_URL=“https://help.ubuntu.com/” BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/” PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy” VERSION_CODENAME=focal UBUNTU_CODENAME=focal
Versions of Apache Airflow Providers
No response
Deployment
Google Cloud Composer
Deployment details
composer-2-4-6-airflow-2-6-3
Anything else
0.5 CPU limit
{
"timestamp": "2023-11-10T00:43:09.627748420Z",
"textPayload": "Triggerer's async thread was blocked for 1.33 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T03:23:03.509407823Z",
"textPayload": "Triggerer's async thread was blocked for 1.11 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T05:19:03.527455640Z",
"textPayload": "Triggerer's async thread was blocked for 0.81 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T07:15:46.782236976Z",
"textPayload": "Triggerer's async thread was blocked for 0.45 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T09:12:34.504077450Z",
"textPayload": "Triggerer's async thread was blocked for 0.42 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T11:09:25.525505471Z",
"textPayload": "Triggerer's async thread was blocked for 0.46 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T13:06:16.307688459Z",
"textPayload": "Triggerer's async thread was blocked for 0.96 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T15:28:10.720029804Z",
"textPayload": "Triggerer's async thread was blocked for 1.29 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T17:24:15.113605243Z",
"textPayload": "Triggerer's async thread was blocked for 0.81 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
1.0 cpu limit
{
"timestamp": "2023-11-10T19:50:40.023219655Z",
"textPayload": "Triggerer's async thread was blocked for 0.57 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T21:42:45.592743140Z",
"textPayload": "Triggerer's async thread was blocked for 0.22 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-10T23:35:31.606070073Z",
"textPayload": "Triggerer's async thread was blocked for 0.21 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T01:28:24.456517186Z",
"textPayload": "Triggerer's async thread was blocked for 0.44 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T03:21:01.439348532Z",
"textPayload": "Triggerer's async thread was blocked for 0.30 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T05:13:32.570751960Z",
"textPayload": "Triggerer's async thread was blocked for 0.21 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T07:05:51.271996064Z",
"textPayload": "Triggerer's async thread was blocked for 0.24 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T10:50:37.726952059Z",
"textPayload": "Triggerer's async thread was blocked for 0.21 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T12:43:03.944972600Z",
"textPayload": "Triggerer's async thread was blocked for 0.24 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T14:35:29.019331992Z",
"textPayload": "Triggerer's async thread was blocked for 0.34 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T16:28:20.008302527Z",
"textPayload": "Triggerer's async thread was blocked for 0.62 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T22:06:00.143154881Z",
"textPayload": "Triggerer's async thread was blocked for 0.23 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-11T23:58:26.327818028Z",
"textPayload": "Triggerer's async thread was blocked for 0.41 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T03:43:12.151969113Z",
"textPayload": "Triggerer's async thread was blocked for 0.59 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T05:35:38.605106362Z",
"textPayload": "Triggerer's async thread was blocked for 0.24 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T07:28:09.902736229Z",
"textPayload": "Triggerer's async thread was blocked for 0.23 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T09:20:40.018579247Z",
"textPayload": "Triggerer's async thread was blocked for 0.22 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T11:13:12.791422256Z",
"textPayload": "Triggerer's async thread was blocked for 0.78 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T13:05:43.599749227Z",
"textPayload": "Triggerer's async thread was blocked for 0.29 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T16:50:34.846483498Z",
"textPayload": "Triggerer's async thread was blocked for 0.24 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-12T18:43:07.569643303Z",
"textPayload": "Triggerer's async thread was blocked for 0.24 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
1.0 cpu, livenessProbe removed from the Deployment
{
"timestamp": "2023-11-12T22:16:11.391819276Z",
"textPayload": "Triggerer's async thread was blocked for 0.37 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T00:07:53.229342739Z",
"textPayload": "Triggerer's async thread was blocked for 0.24 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T03:53:02.842909571Z",
"textPayload": "Triggerer's async thread was blocked for 0.26 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T05:45:36.218492337Z",
"textPayload": "Triggerer's async thread was blocked for 0.33 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T07:38:00.747163337Z",
"textPayload": "Triggerer's async thread was blocked for 0.20 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T09:30:33.190422037Z",
"textPayload": "Triggerer's async thread was blocked for 0.25 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T11:23:15.590603839Z",
"textPayload": "Triggerer's async thread was blocked for 0.27 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T15:08:49.891520932Z",
"textPayload": "Triggerer's async thread was blocked for 0.25 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
{
"timestamp": "2023-11-13T18:54:06.744651454Z",
"textPayload": "Triggerer's async thread was blocked for 0.21 seconds, likely due to the highly utilized environment. In case of Warnings informing you about slow callback consider visiting https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer."
}
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 19 (9 by maintainers)
OK, I’m working on setting up my local environment so that pre-commit, etc, is enabled, before submitting.