mlflow: [BUG] RESOURCE_DOES_NOT_EXIST when mlflow call start_run()

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 - AWS EC2
  • MLflow installed from (source or binary): conda
  • MLflow version (run mlflow --version): mlflow, version 1.20.2
  • Python version: 3.6.9
  • npm version, if running the dev UI:
  • Exact command to reproduce: mlflow.start_run()

Describe the problem

I have remote tracking server (the access policies for EC2 to server are setted correct, but I’m not sure at 100%). I have a main run (parent), and under that parent I also have a few child runs. The issue is related to first start_run() (parent run). When the script calls with mlflow.start_run(), script crashes.

The resposne from server calls: RESOURCE_DOES_NOT_EXIST when looking for run_id

Code to reproduce issue

remote_server_uri = "http://x.x.x.x:xxxx" # set to your server URI
    mlflow.set_tracking_uri(remote_server_uri)
    mlflow.set_experiment('/cargo_movement')
    # You can get the path at the root of the MLflow project with this:
    root_path = os.path.abspath('.')

    # Check which steps we need to execute
    if isinstance(config["main"]["execute_steps"], str):
        # This was passed on the command line as a comma-separated list of steps
        steps_to_execute = config["main"]["execute_steps"].split(",")
    else:

        steps_to_execute = list(config["main"]["execute_steps"])
    
    with mlflow.start_run() as parent_run:
        # Download step
        if "1_download" in steps_to_execute:

            _ = mlflow.run(
                os.path.join(root_path, "1_download"),
                "main",
                parameters={
                    "parent_run_id": parent_run.info.run_id,
                }
            )
        ...

Other info / logs

$ mlflow run .
2021/09/18 13:30:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpy661fhzb for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 13:30:47 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'f7b8bafb58404dcb8e27ae1b901b2524' === 
ENV VAR: f7b8bafb58404dcb8e27ae1b901b2524
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
    active_run_obj = client.get_run(existing_run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/client.py", line 150, in get_run
    return self._tracking_client.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 65, in get_run
    return self.store.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 132, in get_run
    response_proto = self._call_endpoint(GetRun, req_body)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 217, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 169, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run with id=f7b8bafb58404dcb8e27ae1b901b2524 not found
2021/09/18 13:30:48 ERROR mlflow.cli: === Run (ID 'f7b8bafb58404dcb8e27ae1b901b2524') failed ===

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 4
  • Comments: 16 (1 by maintainers)

Most upvoted comments

mlflow run . --experiment_name=“some-experiment-name” --tracking_uri=“some-tracking-uri”

There is no parameter called --tracking_uri The parameter --experiment_name should be --experiment-name Unfortunately this does not work for me, I tried to remove with mlflow.start_run() and keep mlflow.set_tracking_uri() in the code

Well, you can try this step.

  1. Export MLFLow Tracking Server variable like this code below.
export MLFLOW_TRACKING_URI=your_tracking_uri
export MLFLOW_EXPERIMENT_NAME="your_experiment_name"
  1. Run your MLflow Project with this command line.
mlflow run [your/where/MLproject Folder] --no-conda # if you don't want use conda env

Notes:

  • You must remove mlflow.start_run() in your python code, if you don’t remove this line it will create 2 running experiments and create errors
  • You don’t have to use mlflow.set_tracking_uri(), because it is already set in your environment variables.

Hope it will work for you!

Run using python /path/to/file.py and the Python API of MLFlow will work. Running mlflow run and having Python API do not work nicely together.

Hi, is there any work around this?

I have same problem, I think the current solution is to remove the python api mlflow.start_run() and manually add experiment name when you run this command.

mlflow run . --experiment_name="some-experiment-name" --tracking_uri="some-tracking-uri" 

or you can set the environment variable for experiment_name and tracking_uri.

When calling script without mlflow.set_tracking_uri(remote_server_uri) then i get:

mlflow run .
2021/09/18 14:06:55 INFO mlflow.projects.utils: === Created directory /tmp/tmphpullqjs for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 14:06:55 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'ac0582aec6a44f19899f5dfcba02cc39' === 
INFO: 'cargo_movement' does not exist. Creating a new experiment
ENV VAR: ac0582aec6a44f19899f5dfcba02cc39
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 210, in start_run
    raise MlflowException(
mlflow.exceptions.MlflowException: Cannot start run with ID ac0582aec6a44f19899f5dfcba02cc39 because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
2021/09/18 14:06:56 ERROR mlflow.cli: === Run (ID 'ac0582aec6a44f19899f5dfcba02cc39') failed ===

mlflow run . --experiment_name=“some-experiment-name” --tracking_uri=“some-tracking-uri”

There is no parameter called --tracking_uri The parameter --experiment_name should be --experiment-name Unfortunately this does not work for me, I tried to remove with mlflow.start_run() and keep mlflow.set_tracking_uri() in the code