mlflow: [BUG] RESOURCE_DOES_NOT_EXIST when mlflow call start_run()
Willingness to contribute
The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?
- Yes. I can contribute a fix for this bug independently.
- Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
- No. I cannot contribute a bug fix at this time.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 - AWS EC2
- MLflow installed from (source or binary): conda
- MLflow version (run
mlflow --version): mlflow, version 1.20.2 - Python version: 3.6.9
- npm version, if running the dev UI:
- Exact command to reproduce: mlflow.start_run()
Describe the problem
I have remote tracking server (the access policies for EC2 to server are setted correct, but I’m not sure at 100%).
I have a main run (parent), and under that parent I also have a few child runs. The issue is related to first start_run() (parent run). When the script calls with mlflow.start_run(), script crashes.
The resposne from server calls: RESOURCE_DOES_NOT_EXIST when looking for run_id
Code to reproduce issue
remote_server_uri = "http://x.x.x.x:xxxx" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment('/cargo_movement')
# You can get the path at the root of the MLflow project with this:
root_path = os.path.abspath('.')
# Check which steps we need to execute
if isinstance(config["main"]["execute_steps"], str):
# This was passed on the command line as a comma-separated list of steps
steps_to_execute = config["main"]["execute_steps"].split(",")
else:
steps_to_execute = list(config["main"]["execute_steps"])
with mlflow.start_run() as parent_run:
# Download step
if "1_download" in steps_to_execute:
_ = mlflow.run(
os.path.join(root_path, "1_download"),
"main",
parameters={
"parent_run_id": parent_run.info.run_id,
}
)
...
Other info / logs
$ mlflow run .
2021/09/18 13:30:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpy661fhzb for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 13:30:47 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'f7b8bafb58404dcb8e27ae1b901b2524' ===
ENV VAR: f7b8bafb58404dcb8e27ae1b901b2524
Traceback (most recent call last):
File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
go(config)
File "/home/ubuntu/fchardnet/main.py", line 25, in go
with mlflow.start_run() as parent_run:
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
active_run_obj = client.get_run(existing_run_id)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/client.py", line 150, in get_run
return self._tracking_client.get_run(run_id)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 65, in get_run
return self.store.get_run(run_id)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 132, in get_run
response_proto = self._call_endpoint(GetRun, req_body)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 217, in call_endpoint
response = verify_rest_response(response, endpoint)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 169, in verify_rest_response
raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run with id=f7b8bafb58404dcb8e27ae1b901b2524 not found
2021/09/18 13:30:48 ERROR mlflow.cli: === Run (ID 'f7b8bafb58404dcb8e27ae1b901b2524') failed ===
What component(s), interfaces, languages, and integrations does this bug affect?
Components
-
area/artifacts: Artifact stores and artifact logging -
area/build: Build and test infrastructure for MLflow -
area/docs: MLflow documentation pages -
area/examples: Example code -
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models: MLmodel format, model serialization/deserialization, flavors -
area/projects: MLproject format, project running backends -
area/scoring: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra: MLflow Tracking server backend -
area/tracking: Tracking Service, tracking client APIs, autologging
Interface
-
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows: Windows support
Language
-
language/r: R APIs and clients -
language/java: Java APIs and clients -
language/new: Proposals for new client languages
Integrations
-
integrations/azure: Azure and Azure ML integrations -
integrations/sagemaker: SageMaker integrations -
integrations/databricks: Databricks integrations
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 4
- Comments: 16 (1 by maintainers)
Well, you can try this step.
Notes:
mlflow.set_tracking_uri(), because it is already set in your environment variables.Hope it will work for you!
Run using
python /path/to/file.pyand the Python API of MLFlow will work. Runningmlflow runand having Python API do not work nicely together.I have same problem, I think the current solution is to remove the python api mlflow.start_run() and manually add experiment name when you run this command.
or you can set the environment variable for experiment_name and tracking_uri.
When calling script without mlflow.set_tracking_uri(remote_server_uri) then i get:
There is no parameter called
--tracking_uriThe parameter--experiment_nameshould be--experiment-nameUnfortunately this does not work for me, I tried to removewith mlflow.start_run()and keepmlflow.set_tracking_uri()in the code