mlflow: [BUG] mlflow.search_runs() keeps failing with InvalidChunkLength exception

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No, I cannot contribute a fix for this bug at this time.

MLflow version

  • Client: 2.0.1
  • Tracking server: 1.27.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.1 LTS
  • Python version: 3.10.6
  • yarn version, if running the dev UI:

Describe the problem

mlflow.search_runs(experiment_ids=[<XXX>]) fails with the exception: mlflow.exceptions.MlflowException: API request to http://<XXX>/api/2.0/mlflow/runs/search failed with exception (“Connection broken: InvalidChunkLength(got length b’', 0 bytes read)”, InvalidChunkLength(got length b’', 0 bytes read))

Used to work before, still works once in a dozen attempts. Logging works.

Tracking information

REPLACE_ME

Code to reproduce issue

REPLACE_ME

mlflow.set_tracking_uri(<XXX>) mlflow.set_experiment(<XXX>) runs = mlflow.search_runs(experiment_ids=[<XXX>])

Stack trace

REPLACE_ME

File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 748, in _update_chunk_length self.chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: b’’

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 443, in _error_catcher yield File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 815, in read_chunked self._update_chunk_length() File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 752, in _update_chunk_length raise InvalidChunkLength(self, line) urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b’', 0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/home/suser/<XXX>/lib/python3.10/site-packages/requests/models.py”, line 816, in generate yield from self.raw.stream(chunk_size, decode_content=True) File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 623, in stream for line in self.read_chunked(amt, decode_content=decode_content): File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 803, in read_chunked with self._error_catcher(): File “/usr/lib/python3.10/contextlib.py”, line 153, in exit self.gen.throw(typ, value, traceback) File “/home/suser/<XXX>/lib/python3.10/site-packages/urllib3/response.py”, line 460, in _error_catcher raise ProtocolError(“Connection broken: %r” % e, e) urllib3.exceptions.ProtocolError: (“Connection broken: InvalidChunkLength(got length b’', 0 bytes read)”, InvalidChunkLength(got length b’', 0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/utils/rest_utils.py”, line 166, in http_request return _get_http_response_with_retries( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/utils/rest_utils.py”, line 97, in _get_http_response_with_retries return session.request(method, url, **kwargs) File “/home/suser/<XXX>/lib/python3.10/site-packages/requests/sessions.py”, line 587, in request resp = self.send(prep, **send_kwargs) File “/home/suser/<XXX>/lib/python3.10/site-packages/requests/sessions.py”, line 745, in send r.content File “/home/suser/<XXX>/lib/python3.10/site-packages/requests/models.py”, line 899, in content self._content = b"“.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b”" File “/home/suser/<XXX>/lib/python3.10/site-packages/requests/models.py”, line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: (“Connection broken: InvalidChunkLength(got length b’', 0 bytes read)”, InvalidChunkLength(got length b’', 0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main return _run_code(code, main_globals, None, File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code exec(code, run_globals) File “/home/suser/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/adapter/…/…/debugpy/launcher/…/…/debugpy/main.py”, line 39, in <module> cli.main() File “/home/suser/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/adapter/…/…/debugpy/launcher/…/…/debugpy/…/debugpy/server/cli.py”, line 430, in main run() File “/home/suser/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/adapter/…/…/debugpy/launcher/…/…/debugpy/…/debugpy/server/cli.py”, line 284, in run_file runpy.run_path(target, run_name=“main”) File “/home/suser/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py”, line 321, in run_path return _run_module_code(code, init_globals, run_name, File “/home/suser/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py”, line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File “/home/suser/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py”, line 124, in _run_code exec(code, run_globals) File “/home/suser/<XXX>/mlflow_results.py”, line 46, in <module> runs = mlflow.search_runs(experiment_ids=[<XXX>]) File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/tracking/fluent.py”, line 1448, in search_runs runs = get_results_from_paginated_fn( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/utils/init.py”, line 249, in get_results_from_paginated_fn page_results = paginated_fn(max_results_per_page, next_page_token) File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/tracking/fluent.py”, line 1439, in pagination_wrapper_func return MlflowClient().search_runs( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/tracking/client.py”, line 1665, in search_runs return self._tracking_client.search_runs( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py”, line 511, in search_runs return self.store.search_runs( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/store/tracking/abstract_store.py”, line 285, in search_runs runs, token = self._search_runs( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py”, line 271, in _search_runs response_proto = self._call_endpoint(SearchRuns, req_body) File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py”, line 56, in _call_endpoint return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto) File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/utils/rest_utils.py”, line 277, in call_endpoint response = http_request( File “/home/suser/<XXX>/lib/python3.10/site-packages/mlflow/utils/rest_utils.py”, line 184, in http_request raise MlflowException(“API request to %s failed with exception %s” % (url, e)) mlflow.exceptions.MlflowException: API request to http://<XXX>/api/2.0/mlflow/runs/search failed with exception (“Connection broken: InvalidChunkLength(got length b’', 0 bytes read)”, InvalidChunkLength(got length b’', 0 bytes read))

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 17 (6 by maintainers)

Most upvoted comments

Hi @Nika-St I did some digging into similar issues that users have posted regarding this in the requests repo and looked into what this error actually is. From what I’ve gleaned and the fact that I can’t reproduce this with different tracking server configuration types, it leads me to believe that the comms error is either: a) an issue in your tracking server. b) an issue with your local install of urrlib and requests versions having some degreee of incompatibility with each other.

Can you try:

  1. Install the latest versions of urllib and requests?
  2. Upgrade your MLflow server version

It’s a permanent assignment. Once you volunteer, you’re obliga- kidding. I’ll fix it for you and we’ll take a look at the bug! Thanks again for reporting! 😃