mlflow: [BUG] RESOURCE_DOES_NOT_EXIST when mlflow call start_run()

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 - AWS EC2
MLflow installed from (source or binary): conda
MLflow version (run mlflow --version): mlflow, version 1.20.2
Python version: 3.6.9
npm version, if running the dev UI:
Exact command to reproduce: mlflow.start_run()

Describe the problem

I have remote tracking server (the access policies for EC2 to server are setted correct, but I’m not sure at 100%). I have a main run (parent), and under that parent I also have a few child runs. The issue is related to first start_run() (parent run). When the script calls with mlflow.start_run(), script crashes.

The resposne from server calls: RESOURCE_DOES_NOT_EXIST when looking for run_id

Code to reproduce issue

remote_server_uri = "http://x.x.x.x:xxxx" # set to your server URI
    mlflow.set_tracking_uri(remote_server_uri)
    mlflow.set_experiment('/cargo_movement')
    # You can get the path at the root of the MLflow project with this:
    root_path = os.path.abspath('.')

    # Check which steps we need to execute
    if isinstance(config["main"]["execute_steps"], str):
        # This was passed on the command line as a comma-separated list of steps
        steps_to_execute = config["main"]["execute_steps"].split(",")
    else:

        steps_to_execute = list(config["main"]["execute_steps"])
    
    with mlflow.start_run() as parent_run:
        # Download step
        if "1_download" in steps_to_execute:

            _ = mlflow.run(
                os.path.join(root_path, "1_download"),
                "main",
                parameters={
                    "parent_run_id": parent_run.info.run_id,
                }
            )
        ...

Other info / logs

$ mlflow run .
2021/09/18 13:30:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpy661fhzb for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 13:30:47 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'f7b8bafb58404dcb8e27ae1b901b2524' === 
ENV VAR: f7b8bafb58404dcb8e27ae1b901b2524
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
    active_run_obj = client.get_run(existing_run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/client.py", line 150, in get_run
    return self._tracking_client.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 65, in get_run
    return self.store.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 132, in get_run
    response_proto = self._call_endpoint(GetRun, req_body)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 217, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 169, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run with id=f7b8bafb58404dcb8e27ae1b901b2524 not found
2021/09/18 13:30:48 ERROR mlflow.cli: === Run (ID 'f7b8bafb58404dcb8e27ae1b901b2524') failed ===

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

About this issue

Original URL
State: open
Created 3 years ago
Reactions: 4
Comments: 16 (1 by maintainers)

Most upvoted comments

mlflow run . --experiment_name=“some-experiment-name” --tracking_uri=“some-tracking-uri”

There is no parameter called --tracking_uri The parameter --experiment_name should be --experiment-name Unfortunately this does not work for me, I tried to remove with mlflow.start_run() and keep mlflow.set_tracking_uri() in the code

Well, you can try this step.

Export MLFLow Tracking Server variable like this code below.

export MLFLOW_TRACKING_URI=your_tracking_uri
export MLFLOW_EXPERIMENT_NAME="your_experiment_name"

Run your MLflow Project with this command line.

mlflow run [your/where/MLproject Folder] --no-conda # if you don't want use conda env

Notes:

You must remove mlflow.start_run() in your python code, if you don’t remove this line it will create 2 running experiments and create errors
You don’t have to use mlflow.set_tracking_uri(), because it is already set in your environment variables.

Hope it will work for you!

Clayrisee on May 9, 2022

Run using python /path/to/file.py and the Python API of MLFlow will work. Running mlflow run and having Python API do not work nicely together.

ghost on Aug 19, 2022

Hi, is there any work around this?

I have same problem, I think the current solution is to remove the python api mlflow.start_run() and manually add experiment name when you run this command.

mlflow run . --experiment_name="some-experiment-name" --tracking_uri="some-tracking-uri"

or you can set the environment variable for experiment_name and tracking_uri.

Clayrisee on Apr 1, 2022

When calling script without mlflow.set_tracking_uri(remote_server_uri) then i get:

mlflow run .
2021/09/18 14:06:55 INFO mlflow.projects.utils: === Created directory /tmp/tmphpullqjs for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 14:06:55 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'ac0582aec6a44f19899f5dfcba02cc39' === 
INFO: 'cargo_movement' does not exist. Creating a new experiment
ENV VAR: ac0582aec6a44f19899f5dfcba02cc39
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 210, in start_run
    raise MlflowException(
mlflow.exceptions.MlflowException: Cannot start run with ID ac0582aec6a44f19899f5dfcba02cc39 because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
2021/09/18 14:06:56 ERROR mlflow.cli: === Run (ID 'ac0582aec6a44f19899f5dfcba02cc39') failed ===

Jakubelo on Sep 18, 2021

mlflow run . --experiment_name=“some-experiment-name” --tracking_uri=“some-tracking-uri”

There is no parameter called --tracking_uri The parameter --experiment_name should be --experiment-name Unfortunately this does not work for me, I tried to remove with mlflow.start_run() and keep mlflow.set_tracking_uri() in the code

ArtificialTruth on Apr 29, 2022