mlflow: [BUG] MLFlow commands fails to execute concurrently

#1769 # System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): Nope
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Jenkins agent, miniconda docker running on linuxkit
MLflow installed from (source or binary): binary
MLflow version (run mlflow --version): 0.9.1 (but the same for 1.4.0)
Python version: 2.7
npm version, if running the dev UI:
Exact command to reproduce: mlflow run -e test ./test-model mlflow run -e build ./test-model

Describe the problem

Running CI pipeline on jenkins, since tests and training takes long time I wanted to parallelize it. It fails to be executed concurrently, but passes on sequential call.

Code to reproduce issue

 stage('Parallel') {
     failFast true
     parallel {
        stage('MLFlow test') {
            steps {
                   sh 'mlflow run -e test ./test-model'
            }
        }
        stage('MLFlow build') {
            steps {
                   sh 'mlflow run -e build ./test-model'                                                   
            }
        }
     }
 }

Other info / logs


2019/12/06 16:05:59 INFO mlflow.projects: === Creating conda environment mlflow-dee3cb9d8ec1d078ac23cda04ee09d2ead6fbbbd ===

Collecting package metadata (repodata.json): ...working... Traceback (most recent call last):

  File "/opt/conda/bin/mlflow", line 10, in <module>

    sys.exit(cli())

  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__

    return self.main(*args, **kwargs)

  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main

    rv = self.invoke(ctx)

  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke

    return _process_result(sub_ctx.command.invoke(sub_ctx))

  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke

    return ctx.invoke(self.callback, **ctx.params)

  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke

    return callback(*args, **kwargs)

  File "/opt/conda/lib/python2.7/site-packages/mlflow/cli.py", line 139, in run

    run_id=run_id,

  File "/opt/conda/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 230, in run

    storage_dir=storage_dir, block=block, run_id=run_id)

  File "/opt/conda/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 132, in _run

    conda_env_name = _get_or_create_conda_env(project.conda_env_path)

  File "/opt/conda/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 462, in _get_or_create_conda_env

    conda_env_path], stream_output=True)

  File "/opt/conda/lib/python2.7/site-packages/mlflow/utils/process.py", line 38, in exec_cmd

    raise ShellCommandException("Non-zero exitcode: %s" % (exit_code))

mlflow.utils.process.ShellCommandException: Non-zero exitcode: -9

script returned exit code 1

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on Jan 3, 2020

Ok, thank you. Sounds as a way around for us.

But I still would better to allow mlflow to parse its own format of MLproject rather then trying to figure out which conda environment to create.

Tradunsky on Feb 11, 2020

Ah my mistake, I ran the commands in concurrently but copy-pasted the incorrect output. It should be:

(base) $: mlflow run -e build ~/mlflow-example-project & mlflow run -e test ~/mlflow-example-project
[1] 72371
2020/02/06 16:53:52 INFO mlflow.projects: === Created directory /var/folders/tg/vkmyk0kj2615vnmh0tz_88wr0000gp/T/tmpKfqcZ0 for downloading remote URIs passed to arguments of type 'path' ===
2020/02/06 16:53:52 INFO mlflow.projects: === Created directory /var/folders/tg/vkmyk0kj2615vnmh0tz_88wr0000gp/T/tmplfAV3k for downloading remote URIs passed to arguments of type 'path' ===
2020/02/06 16:53:52 INFO mlflow.projects: === Running command 'source /Users/avesh.singh/opt/anaconda2/bin/../etc/profile.d/conda.sh && conda activate mlflow-a334bf26be61bfb18b0f661cca4057c207a14948 1>&2 && echo 'hello from test'' in run with ID '18bc7e80c6b540dda738097a100ba607' === 
2020/02/06 16:53:52 INFO mlflow.projects: === Running command 'source /Users/avesh.singh/opt/anaconda2/bin/../etc/profile.d/conda.sh && conda activate mlflow-a334bf26be61bfb18b0f661cca4057c207a14948 1>&2 && echo 'hello from build'' in run with ID '8417c45a37304573a506078e39d47f91' === 
hello from build
hello from test
2020/02/06 16:53:52 INFO mlflow.projects: === Run (ID '8417c45a37304573a506078e39d47f91') succeeded ===
2020/02/06 16:53:52 INFO mlflow.projects: === Run (ID '18bc7e80c6b540dda738097a100ba607') succeeded ===
[1]+  Done                    mlflow run -e build ~/mlflow-example-project

Re exposing mlflow (CLI and libs) methods to create Conda environments: There’s a tradeoff here between flexibility and ease-of-use. I don’t see parallel test execution as a compelling argument to add Conda creation as an mlflow method since you can create the conda environment in the test setup. But please let me know if there are other reasons for this functionality.

Re failed conda creation: I believe that if the conda env fails to be fully created, then it will be cleaned up.

A failed env creation:

(base) $: mlflow run -e build ~/mlflow-example-project 
2020/02/06 16:56:29 INFO mlflow.projects: === Creating conda environment mlflow-eb8b3286a36ea844a1c613513ade706ff2c6eeef ===
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - pandas==2.0.0

Traceback (most recent call last):
  File "/Users/avesh.singh/opt/anaconda2/bin/mlflow", line 10, in <module>
    sys.exit(cli())
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/mlflow/cli.py", line 134, in run
    run_id=run_id
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 286, in run
    use_conda=use_conda, storage_dir=storage_dir, synchronous=synchronous, run_id=run_id)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 162, in _run
    conda_env_name = _get_or_create_conda_env(project.conda_env_path)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 451, in _get_or_create_conda_env
    conda_env_path], stream_output=True)
  File "/Users/avesh.singh/opt/anaconda2/lib/python2.7/site-packages/mlflow/utils/process.py", line 38, in exec_cmd
    raise ShellCommandException("Non-zero exitcode: %s" % (exit_code))
mlflow.utils.process.ShellCommandException: Non-zero exitcode: 1

We don’t see this env in the list of conda environments:

(base) $: conda info --envs
# conda environments:
#
base                  *  /Users/avesh.singh/opt/anaconda2
mlflow-a334bf26be61bfb18b0f661cca4057c207a14948     /Users/avesh.singh/opt/anaconda2/envs/mlflow-a334bf26be61bfb18b0f661cca4057c207a14948
mlflow-dev-env           /Users/avesh.singh/opt/anaconda2/envs/mlflow-dev-env

AveshCSingh on Feb 7, 2020