airflow: Logging bug in a long runs

Apache Airflow version: 2.0.2

Environment: Kubernetes v1.18.3 Openshift 4.5.37

What happened: We are running our python code in kubernetes operators(airflow.contrib.operators.kubernetes_pod_operator). During long runs(>10h) the airflow with the logs turned on(get_logs=True in k8s operator field) behaves absolutely normally, and then throws an unexpected error.

If we set get_logs=False - we have success dag run, otherwise, we have the same error every time.

Logs:

> [2021-05-18 13:54:10,199] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 696, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 436, in _error_catcher
    yield
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 763, in read_chunked
    self._update_chunk_length()
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 700, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 366, in execute
    final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in create_new_pod_for_operator
    final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 145, in monitor_pod
    for line in logs:
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 807, in __iter__
    for chunk in self.stream(decode_content=True):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 571, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 454, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2021-05-18 13:54:10,204] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=pipline, task_id=task7, execution_date=20210518T132920, start_date=20210518T133244, end_date=20210518T135410
[2021-05-18 13:54:10,280] {local_task_job.py:146} INFO - Task exited with return code 1

We have an airflow instance on other kubernetes server, where we are able to run the same code with the same dags and get no errors.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

And I heartily recommend “search” on Airlfow docs site. It really fast and really good:

image

@sg27 Because you are looking in a wrong place. This is a kubernetes provider fix, not airflow. https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html

@trucnguyenlam -> just upgrade to latest cncf.kubernetes provider.