mlflow: [BUG] MLFlow commands fails to execute concurrently
#1769 # System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow): Nope
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Jenkins agent, miniconda docker running on linuxkit
- MLflow installed from (source or binary): binary
- MLflow version (run
mlflow --version): 0.9.1 (but the same for 1.4.0) - Python version: 2.7
- npm version, if running the dev UI:
- Exact command to reproduce: mlflow run -e test ./test-model mlflow run -e build ./test-model
Describe the problem
Running CI pipeline on jenkins, since tests and training takes long time I wanted to parallelize it. It fails to be executed concurrently, but passes on sequential call.
Code to reproduce issue
stage('Parallel') {
failFast true
parallel {
stage('MLFlow test') {
steps {
sh 'mlflow run -e test ./test-model'
}
}
stage('MLFlow build') {
steps {
sh 'mlflow run -e build ./test-model'
}
}
}
}
Other info / logs
2019/12/06 16:05:59 INFO mlflow.projects: === Creating conda environment mlflow-dee3cb9d8ec1d078ac23cda04ee09d2ead6fbbbd ===
Collecting package metadata (repodata.json): ...working... Traceback (most recent call last):
File "/opt/conda/bin/mlflow", line 10, in <module>
sys.exit(cli())
File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python2.7/site-packages/mlflow/cli.py", line 139, in run
run_id=run_id,
File "/opt/conda/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 230, in run
storage_dir=storage_dir, block=block, run_id=run_id)
File "/opt/conda/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 132, in _run
conda_env_name = _get_or_create_conda_env(project.conda_env_path)
File "/opt/conda/lib/python2.7/site-packages/mlflow/projects/__init__.py", line 462, in _get_or_create_conda_env
conda_env_path], stream_output=True)
File "/opt/conda/lib/python2.7/site-packages/mlflow/utils/process.py", line 38, in exec_cmd
raise ShellCommandException("Non-zero exitcode: %s" % (exit_code))
mlflow.utils.process.ShellCommandException: Non-zero exitcode: -9
script returned exit code 1
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (5 by maintainers)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Ok, thank you. Sounds as a way around for us.
But I still would better to allow mlflow to parse its own format of MLproject rather then trying to figure out which conda environment to create.
Ah my mistake, I ran the commands in concurrently but copy-pasted the incorrect output. It should be:
Re exposing mlflow (CLI and libs) methods to create Conda environments: There’s a tradeoff here between flexibility and ease-of-use. I don’t see parallel test execution as a compelling argument to add Conda creation as an mlflow method since you can create the conda environment in the test setup. But please let me know if there are other reasons for this functionality.
Re failed conda creation: I believe that if the conda env fails to be fully created, then it will be cleaned up.
A failed env creation:
We don’t see this env in the list of conda environments: