mlflow: Unable to use mlflow.projects.run API on my Spark Cluster

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux
MLflow installed from (source or binary):
MLflow version (run mlflow --version): 1.0.0
Python version: 3.7
npm version, if running the dev UI:
Exact command to reproduce:

Describe the problem

I have saved a python file (The wine-quality python file as in mlflow example) and MLProject file in my own github repository. I am trying to log the model using mlflow.projects.run API giving the path of my github repository. However I get an error :

mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run ‘2ad8d348b4bd4cf59f16d3dc8eea9bf7’ not found

Code to reproduce issue

MLProject : - entry_points: main: command: “/usr/bin/spark-submit train.py”

train.py > Same as mlflow example Just added a line to change the tracking UI to my mlfow server.

On my spark cluster I run : -

import mlflow r = mlflow.projects.run(“–my github path”,version=‘master’,use_conda=False)

Other info / logs

2019/08/07 05:13:41 INFO mlflow.projects: === Created directory /tmp/tmpowd4asyz for downloading remote URIs passed to arguments of type ‘path’ === 2019/08/07 05:13:41 INFO mlflow.projects: === Running command ‘/usr/bin/spark-submit train.py’ in run with ID ‘2ad8d348b4bd4cf59f16d3dc8eea9bf7’ === Traceback (most recent call last): File “/tmp/tmpae63sriw/train.py”, line 49, in <module> with mlflow.start_run(): File “/usr/hdp/current/Anaconda/lib/python3.7/site-packages/mlflow/tracking/fluent.py”, line 116, in start_run active_run_obj = MlflowClient().get_run(existing_run_id) File “/usr/hdp/current/Anaconda/lib/python3.7/site-packages/mlflow/tracking/client.py”, line 49, in get_run return self.store.get_run(run_id) File “/usr/hdp/current/Anaconda/lib/python3.7/site-packages/mlflow/store/rest_store.py”, line 133, in get_run response_proto = self._call_endpoint(GetRun, req_body) File “/usr/hdp/current/Anaconda/lib/python3.7/site-packages/mlflow/store/rest_store.py”, line 69, in _call_endpoint response = self._verify_rest_response(response, endpoint) File “/usr/hdp/current/Anaconda/lib/python3.7/site-packages/mlflow/store/rest_store.py”, line 52, in _verify_rest_response return verify_rest_response(response, endpoint) File “/usr/hdp/current/Anaconda/lib/python3.7/site-packages/mlflow/utils/rest_utils.py”, line 84, in verify_rest_response raise RestException(json.loads(response.text)) mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run ‘2ad8d348b4bd4cf59f16d3dc8eea9bf7’ not found 19/08/07 05:13:46 INFO ShutdownHookManager: Shutdown hook called 19/08/07 05:13:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-c917b822-3cb8-469a-9b48-bd7d662fa2b4

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 15 (5 by maintainers)

Most upvoted comments

Thanks for the details. Did you set the MLflow tracking URI in the program where you called mlflow.run too? The mlflow.run command creates the run, then spawns a process (train.py in this case) that uses that run ID, so it needs to be using the same tracking server. This error message makes it seem that the run does not exist on the tracking server.

Basically, in your Python code before mlflow.run(), add mlflow.set_tracking_uri(your URI).

mateiz on Aug 20, 2019