airflow: Logging bug in a long runs
Apache Airflow version: 2.0.2
Environment: Kubernetes v1.18.3 Openshift 4.5.37
What happened:
We are running our python code in kubernetes operators(airflow.contrib.operators.kubernetes_pod_operator).
During long runs(>10h) the airflow with the logs turned on(get_logs=True
in k8s operator field) behaves absolutely normally, and then throws an unexpected error.
If we set get_logs=False
- we have success dag run, otherwise, we have the same error every time.
Logs:
> [2021-05-18 13:54:10,199] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 696, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 436, in _error_catcher
yield
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 763, in read_chunked
self._update_chunk_length()
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 700, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 366, in execute
final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 513, in create_new_pod_for_operator
final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 145, in monitor_pod
for line in logs:
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 807, in __iter__
for chunk in self.stream(decode_content=True):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 571, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 454, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2021-05-18 13:54:10,204] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=pipline, task_id=task7, execution_date=20210518T132920, start_date=20210518T133244, end_date=20210518T135410
[2021-05-18 13:54:10,280] {local_task_job.py:146} INFO - Task exited with return code 1
We have an airflow instance on other kubernetes server, where we are able to run the same code with the same dags and get no errors.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (11 by maintainers)
And I heartily recommend “search” on Airlfow docs site. It really fast and really good:
@sg27 Because you are looking in a wrong place. This is a kubernetes provider fix, not airflow. https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html
@trucnguyenlam -> just upgrade to latest cncf.kubernetes provider.