airflow: KubernetesPodOperator breaks with active log-collection for long running tasks

I’m encountering the same bug reported in https://issues.apache.org/jira/browse/AIRFLOW-3534, with airflow 1.10.12.

[2020-11-06 13:03:29,672] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:29,673] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:30,681] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:30,681] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:31,692] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:31,692] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:32,702] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Running
[2020-11-06 13:04:32,740] {taskinstance.py:1150} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 696, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 436, in _error_catcher
    yield
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 763, in read_chunked
    self._update_chunk_length()
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 700, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 979, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 284, in execute
    final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 403, in create_new_pod_for_operator
    final_state, result = launcher.monitor_pod(pod=pod, get_logs=self.get_logs)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/kubernetes/pod_launcher.py", line 155, in monitor_pod
    for line in logs:
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 807, in __iter__
    for chunk in self.stream(decode_content=True):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 571, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
    self._original_response.close()
  File "/opt/bitnami/python/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 454, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2020-11-06 13:04:32,743] {taskinstance.py:1194} INFO - Marking task as UP_FOR_RETRY. dag_id=..., task_id=..., execution_date=20201106T120000, start_date=20201106T130329, end_date=20201106T130432
[2020-11-06 13:04:34,641] {local_task_job.py:102} INFO - Task exited with return code 1

The bug goes away by setting get_logs=False in the KubernetesPodOperator. Reproduced with multiple dags and tasks.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 4
  • Comments: 32 (16 by maintainers)

Commits related to this issue

Most upvoted comments

What? EarnestKubernetesPodOperator does not exist in Airflow Codebase. You must have some local modifications in your code.

Can you please give me a example?

@dmateusp I found an issue with the above fix. When running with the followings arguments, only the first print() is logged, the later ones are discarded:

cmds=["bash", "-c"],
arguments=[
    'python -c "print(\\"hello\\")" && sleep 120 && python -c "print(\\"hello again\\")" && sleep 30 && python -c "print(\\"last hello\\")" && sleep 30'
]

Log output:

[2021-01-07 16:20:55,327] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:21:55,331] {pod_launcher.py:156} INFO - b'hello\n'
[2021-01-07 16:21:55,332] {sample_dag.py:26} INFO - The pod has not logged since the logs were last fetched
[2021-01-07 16:21:55,365] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:21:55,365] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:21:57,374] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:21:57,375] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:21:59,385] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:21:59,385] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:01,398] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:01,398] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:03,409] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:03,409] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:05,420] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:05,420] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:07,435] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:07,436] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:09,451] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:09,452] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:11,463] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:11,463] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:13,474] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:13,474] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:15,495] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:15,495] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:17,506] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:17,506] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:19,527] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:19,527] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:21,538] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:21,538] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:23,549] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:23,549] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:25,559] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:25,559] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:27,572] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:27,572] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:29,588] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:29,588] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:31,598] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:31,598] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:33,608] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:33,608] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:35,619] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:35,619] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:37,630] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:37,630] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:39,641] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:39,642] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:41,650] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:41,650] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:43,662] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:43,662] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:45,672] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:45,672] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:47,683] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:47,683] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:49,693] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:49,693] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:51,707] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:51,707] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:53,719] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:53,719] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:55,728] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:55,729] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:57,740] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:57,740] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:22:59,751] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:22:59,751] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:01,766] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:01,766] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:03,776] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:03,776] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:05,794] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:05,794] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:07,807] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:07,808] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:09,819] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:09,820] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:11,832] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:11,832] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:13,846] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:13,846] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:15,858] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:15,858] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:17,872] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:17,872] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:19,884] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:19,885] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:21,897] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:21,897] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:23,907] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:23,907] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:25,919] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:25,920] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:27,928] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:27,928] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:29,941] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:29,941] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:31,953] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:31,953] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:33,968] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:33,968] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:35,979] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:35,979] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:37,991] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:37,991] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:40,002] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:40,002] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:42,014] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:42,014] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:44,026] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:44,026] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:46,039] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:46,039] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:48,050] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:48,051] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:50,061] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:50,061] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:52,071] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:52,071] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:54,082] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Running
[2021-01-07 16:23:54,082] {pod_launcher.py:166} INFO - Pod test-2a1b09d88b9340a291751b9c1f536d7b has state running
[2021-01-07 16:23:56,095] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Succeeded
[2021-01-07 16:23:56,095] {pod_launcher.py:287} INFO - Event with job id test-2a1b09d88b9340a291751b9c1f536d7b Succeeded
[2021-01-07 16:23:56,106] {pod_launcher.py:173} INFO - Event: test-2a1b09d88b9340a291751b9c1f536d7b had an event of type Succeeded
[2021-01-07 16:23:56,106] {pod_launcher.py:287} INFO - Event with job id test-2a1b09d88b9340a291751b9c1f536d7b Succeeded
[2021-01-07 16:23:56,158] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=sample_dag_2, task_id=kubernetes_task_2, execution_date=20210107T162031, start_date=20210107T162051, end_date=20210107T162356
[2021-01-07 16:23:57,928] {local_task_job.py:102} INFO - Task exited with return code 0

Expected log output, when using the following arguments (reduced sleep time to avoid IncompleteRead):

'python -c "print(\\"hello\\")" && sleep 5 && python -c "print(\\"hello again\\")" && sleep 5 && python -c "print(\\"last hello\\")" && sleep 5'
[2021-01-07 16:35:41,059] {pod_launcher.py:173} INFO - Event: test-3a5e94a9eb344c2786e583024c9ccdfc had an event of type Pending
[2021-01-07 16:35:41,059] {pod_launcher.py:139} WARNING - Pod not yet started: test-3a5e94a9eb344c2786e583024c9ccdfc
[2021-01-07 16:35:42,072] {pod_launcher.py:173} INFO - Event: test-3a5e94a9eb344c2786e583024c9ccdfc had an event of type Running
[2021-01-07 16:35:57,773] {pod_launcher.py:156} INFO - b'hello\n'
[2021-01-07 16:35:57,774] {pod_launcher.py:156} INFO - b'hello again\n'
[2021-01-07 16:35:57,774] {pod_launcher.py:156} INFO - b'last hello\n'
[2021-01-07 16:35:57,790] {pod_launcher.py:173} INFO - Event: test-3a5e94a9eb344c2786e583024c9ccdfc had an event of type Succeeded
[2021-01-07 16:35:57,791] {pod_launcher.py:287} INFO - Event with job id test-3a5e94a9eb344c2786e583024c9ccdfc Succeeded
[2021-01-07 16:35:57,798] {pod_launcher.py:173} INFO - Event: test-3a5e94a9eb344c2786e583024c9ccdfc had an event of type Succeeded
[2021-01-07 16:35:57,798] {pod_launcher.py:287} INFO - Event with job id test-3a5e94a9eb344c2786e583024c9ccdfc Succeeded
[2021-01-07 16:35:57,844] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=sample_dag_2, task_id=kubernetes_task_2, execution_date=20210107T163523, start_date=20210107T163538, end_date=20210107T163557

@ImNiceMan Earnest is the name of the company that Daniel (@dmateusp) works for. He made a subclass of KubernetesPodOperator to patch the issue, and he put the name of the company on the front of the subclass name to indicate “this is a custom patch that we are using; not a part of the actual Kubernetes library.”

So you can’t import it, but you can make your own subclass in your code that fixes the issue the same way he did.

What? EarnestKubernetesPodOperator does not exist in Airflow Codebase. You must have some local modifications in your code.

the version of the airflow cncf provider package ->

https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/3.0.0/

Hi @mikaeld, We have 2 clusters and 3 instances - latest setup I ran into it

cluster 1 (prod) airflow-2.2.2 apache-airflow-providers-cncf-kubernetes 2.1.0 -> 5 times in the last month

cluster 2 (test) airflow-2.2.3 apache-airflow-providers-cncf-kubernetes 2.2.0 - not noticed so far

cluster 2 (dev) airflow-2.2.3 apache-airflow-providers-cncf-kubernetes 2.2.0 -> one time in the last month

Yes I’m having this issue running Airflow 1.10.12 with KubernetesPodOperator which is running a queries packaged in a Docker image. We are running long-running queries where the queries will be running and producing the logs in Airflow for 2-4 hours and then at 3hrs 59 minutes of query logs we would just get:

INFO - Event <pod_name> had an event of type Running
INFO - Event: <pod_name> has state running

repeated every 2 seconds for 2.5 to 3.5 hours until a failed state.

INFO - Event <pod_name> had an event of type Failed

Is anyone still having the issue mentioned by @yamrzou , I’m using airflow 1.10.14 and have a similar issue where the logs of a long running task are similar to #12136 (comment).

I was able to reproduce this issue after a few days of letting Bash containers run from KubernetesPodOperator, and implement a fix:

I ran two bash tasks, one with the fix (EarnestKubernetesPodOperator) and one without (KubernetesPodOperator):

    bash_echo_1 = KubernetesPodOperator(
        task_id="bash_echo_1",
        image="bash:4.4",
        name="bash-echo-1",
        cmds=[
            "bash",
            "-c",
            "while true; do sleep 5; done;",
        ],
        do_xcom_push=True,
        get_logs=True,
        **t.DEFAULT_POD_ARGS,
    )

    bash_echo_2 = EarnestKubernetesPodOperator(
        task_id="bash_echo_2",
        name="bash-echo-2",
        image="bash:4.4",
        cmds=["bash", "-c", "while true; do sleep 5; done;",],
        get_logs=True,
        **t.DEFAULT_POD_ARGS,
    )

I let both of these run for a few days and the one without the fix failed a few times while the one with the fix never failed:

Screen Shot 2021-01-04 at 14 07 50

^ First row is bash_echo_1 and second row is bash_echo_2

In the logs of bash_echo_1 we can see that the issue was reproduced after more than a day running:

[2020-12-31 03:00:36,636] {logging_mixin.py:112} INFO - [2020-12-31 03:00:36,635] {pod_launcher.py:142} INFO - Event: bash-echo-1-3a123499 had an event of type Running
[2021-01-01 06:12:42,517] {taskinstance.py:1145} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 685, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
    self._update_chunk_length()
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 689, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 251, in execute
    get_logs=self.get_logs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 117, in run_pod
    return self._monitor_pod(pod, get_logs)
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 124, in _monitor_pod
    for line in logs:
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 796, in __iter__
    for chunk in self.stream(decode_content=True):
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 560, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 443, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

While the same issue happened in bash_echo_2 but it was handled by the change in this PR

[2020-12-31 03:00:36,592] {logging_mixin.py:112} INFO - [2020-12-31 03:00:36,592] {kubernetes_pod_operator.py:270} INFO - Event: bash-echo-2-72839c01adc84a29b3903527d6e6dfba had an event of type Running
[2021-01-01 06:20:38,083] {logging_mixin.py:112} INFO - [2021-01-01 06:20:38,082] {kubernetes_pod_operator.py:248} INFO - The pod has not logged since the logs were last fetched
[2021-01-01 06:20:38,373] {logging_mixin.py:112} INFO - [2021-01-01 06:20:38,372] {kubernetes_pod_operator.py:270} INFO - Event: bash-echo-2-72839c01adc84a29b3903527d6e6dfba had an event of type Running

The only change was to add error handling around self.read_pod_logs in the KubernetesPodOperator:

            try:
                logs = self.read_pod_logs(pod)
                for line in logs:
                    self.log.info(line)
            except ProtocolError as pe:
                _, protocol_exception = pe.args
                # When no logs are fetched, an IncompleteRead is thrown trying
                # to decode the stream
                if str(protocol_exception) == "IncompleteRead(0 bytes read)":
                    self.log.info(
                        "The pod has not logged since the logs were last fetched"
                    )
                # If the exception is not about an empty stream we raise it
                else:
                    raise e

I’ll open a PR asap 🏃