mlflow: [BUG] mlflow logs pytorch model instead of weights only -> prevents serving modularized code
Willingness to contribute
The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?
- Yes. I can contribute a fix for this bug independently.
- Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
- No. I cannot contribute a bug fix at this time.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow): Y
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Arch Linux
- MLflow installed from (source or binary): Binary
- MLflow version (run
mlflow --version): 1.10.0 - Python version: 3.7.6
- Exact command to reproduce: mlflow models serve -m MODELPATH
Describe the problem
I successfully trained a model. Now, when trying to serve it I run into:
zeth@master /tmp> mlflow models serve -m /tmp/exploding_springfield/mlruns/0/f7c632e43f93437280cc72b88f279a56/artifacts/models (base)
2020/08/11 14:47:26 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2020/08/11 14:47:28 INFO mlflow.pyfunc.backend: === Running command 'source /home/zeth/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-dd325f076f6465c8205b2342fd8ab4531e905e1a 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2020-08-11 14:47:28 +0200] [20429] [INFO] Starting gunicorn 20.0.4
[2020-08-11 14:47:28 +0200] [20429] [INFO] Listening at: http://127.0.0.1:5000 (20429)
[2020-08-11 14:47:28 +0200] [20429] [INFO] Using worker: sync
[2020-08-11 14:47:28 +0200] [20435] [INFO] Booting worker with pid: 20435
[2020-08-11 14:47:29 +0200] [20435] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/home/zeth/anaconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/pyfunc/scoring_server/wsgi.py", line 6, in <module>
app = scoring_server.init(load_model(os.environ[scoring_server._SERVER_MODEL_PATH]))
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 473, in load_model
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/pytorch/__init__.py", line 423, in _load_pyfunc
return _PyTorchWrapper(_load_model(path, **kwargs))
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/pytorch/__init__.py", line 331, in _load_model
import torch
ModuleNotFoundError: No module named 'torch'
[2020-08-11 14:47:29 +0200] [20435] [INFO] Worker exiting (pid: 20435)
[2020-08-11 14:47:29 +0200] [20429] [INFO] Shutting down: Master
[2020-08-11 14:47:29 +0200] [20429] [INFO] Reason: Worker failed to boot.
Traceback (most recent call last):
File "/home/zeth/anaconda3/bin/mlflow", line 8, in <module>
sys.exit(cli())
File "/home/zeth/anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/zeth/anaconda3/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/zeth/anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/models/cli.py", line 59, in serve
host=host)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/pyfunc/backend.py", line 92, in serve
command_env=command_env)
File "/home/zeth/anaconda3/lib/python3.7/site-packages/mlflow/pyfunc/backend.py", line 172, in _execute_in_conda_env
command, rc
Exception: Command 'source /home/zeth/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-dd325f076f6465c8205b2342fd8ab4531e905e1a 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app' returned non zero return code. Return code = 3
The conda.yml file is not broken:
zeth@master /t/e/m/0/f/a/models> bat conda.yaml (mlf-core)
───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: conda.yaml
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ channels:
2 │ - defaults
3 │ - conda-forge
4 │ - pytorch
5 │ dependencies:
6 │ - python=3.7.7
7 │ - pytorch=1.6.0
8 │ - torchvision=0.7.0
9 │ - pip
10 │ - pip:
11 │ - mlflow
12 │ - cloudpickle==1.5.0
13 │ name: mlflow-env
───────┴─────────────────────────
And the Conda environment contains torch as well (verified).
I expect the model to be serving without any issues.
Code to reproduce issue
Difficult to share, but if required I can absolutely do so.
What component(s), interfaces, languages, and integrations does this bug affect?
Components
-
area/models: MLmodel format, model serialization/deserialization, flavors -
area/scoring: Local serving, model deployment tools, spark UDFs
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 24 (24 by maintainers)
I was able to reproduce the same issue for pytorch.
folder structure
code
model.py
train.py
load.py
How to reproduce the error:
output:
I guess so. It seems mlflow currenly logs a pytorch model in an unrecommended way.