mlflow: [BUG]Failed to start the server using MLflow authentication. error: Reason: Worker failed to boot.
Issues Policy acknowledgement
- I have read and agree to submit bug reports in accordance with the issues policy
Willingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
- Client: 2.5.0
- Tracking server: 2.5.0
System information
- Docker image :https://hub.docker.com/r/adacotechjp/mlflow
- Linux Ubuntu 22.04
- Python 3.9.14
Describe the problem
I am using a Docker image and updated MLflow to version 2.5.0 within it. When trying to start MLflow with the command ‘mlflow server --app-name basic-auth’, it fail.
Tracking information
System information: Linux #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023
Python version: 3.9.14
MLflow version: 2.5.0
MLflow module location: /usr/local/python-3.9.14/lib/python3.9/site-packages/mlflow/__init__.py
Tracking URI: file:///opt/apps/mlruns
Registry URI: file:///opt/apps/mlruns
MLflow dependencies:
Flask: 2.3.2
Jinja2: 3.1.2
alembic: 1.11.1
click: 8.1.3
cloudpickle: 2.2.1
databricks-cli: 0.17.7
docker: 6.1.3
entrypoints: 0.4
gitpython: 3.1.31
gunicorn: 20.1.0
importlib-metadata: 6.6.0
markdown: 3.4.3
matplotlib: 3.7.1
numpy: 1.22.4
packaging: 23.1
pandas: 1.4.2
protobuf: 3.20.1
pyarrow: 12.0.1
pytz: 2022.1
pyyaml: 6.0
querystring-parser: 1.2.4
requests: 2.27.1
scikit-learn: 1.1.1
scipy: 1.6.1
sqlalchemy: 2.0.16
sqlparse: 0.4.4
Code to reproduce issue
docker pull adacotechjp/mlflow
docker run --rm --name mlflow-container -it -p 8001:5000 adacotechjp/mlflow:2.4.0 bash
pip install -U mlflow
mlflow server --app-name basic-auth
Stack trace
[2023-07-25 09:24:52 +0000] [45] [INFO] Starting gunicorn 20.1.0
[2023-07-25 09:24:52 +0000] [45] [INFO] Listening at: http://127.0.0.1:5000 (45)
[2023-07-25 09:24:52 +0000] [45] [INFO] Using worker: sync
[2023-07-25 09:24:52 +0000] [46] [INFO] Booting worker with pid: 46
[2023-07-25 09:24:52 +0000] [47] [INFO] Booting worker with pid: 47
[2023-07-25 09:24:53 +0000] [48] [INFO] Booting worker with pid: 48
[2023-07-25 09:24:53 +0000] [49] [INFO] Booting worker with pid: 49
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 8606fa83a998, initial_migration
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 8606fa83a998, initial_migration
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 8606fa83a998, initial_migration
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 8606fa83a998, initial_migration
[2023-07-25 09:24:54 +0000] [45] [WARNING] Worker with pid 48 was terminated due to signal 15
[2023-07-25 09:24:54 +0000] [45] [WARNING] Worker with pid 49 was terminated due to signal 15
[2023-07-25 09:24:54 +0000] [45] [WARNING] Worker with pid 47 was terminated due to signal 15
[2023-07-25 09:24:54 +0000] [45] [INFO] Shutting down: Master
[2023-07-25 09:24:54 +0000] [45] [INFO] Reason: Worker failed to boot.
Running the mlflow server failed. Please see the logs above for details.
Other info / logs
None
What component(s) does this bug affect?
-
area/artifacts: Artifact stores and artifact logging -
area/build: Build and test infrastructure for MLflow -
area/docs: MLflow documentation pages -
area/examples: Example code -
area/gateway: AI Gateway service, Gateway client APIs, third-party Gateway integrations -
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models: MLmodel format, model serialization/deserialization, flavors -
area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates -
area/projects: MLproject format, project running backends -
area/scoring: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra: MLflow Tracking server backend -
area/tracking: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows: Windows support
What language(s) does this bug affect?
-
language/r: R APIs and clients -
language/java: Java APIs and clients -
language/new: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure: Azure and Azure ML integrations -
integrations/sagemaker: SageMaker integrations -
integrations/databricks: Databricks integrations
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 20 (4 by maintainers)
Hi, I got exactly same issue.
I believe these error are largely rooted from the heterogeneous design of auth db config.
To be specific, why we have to set an
.inifile and pointing it usingMLFLOW_AUTH_CONFIG_PATHis required, whereas all existing DB related configs,--backend-store-uriand--registry-store-uriare simply passing an URI as arguments?Just like what
--registry-store-uridid, introducing something like--auth-store-uri(and use--backend-store-uriby default) would make it much less error-prone because we can share the same codebase.https://github.com/mlflow/mlflow/blob/304ca3d2598dee681b311ad6add85c7b51f482fe/mlflow/cli.py#L292-L300
I was having the same problem as @wwwwf with the remote permissions database, but I think I found a solution.
To be fairly honest, I did a couple of things and I’m not sure if everything is required, it’s just what I did and it’s currently working.
I looked at the code and apparently there is an argument called authorization_function, so I added that to basic_auth.ini.
This is my setup:
basic_auth.ini:
Dockerfile:
I’m using a bash script to run the server.
server.sh:
And there is another step that I did, not really sure if this makes sense or not, but I was trying a bunch of stuff and then it started working, so it might have to be necessary. With the setup above I was not getting any errors running the container, but the UI wasn’t opening. I was using a postgresql instance on Google Cloud Platform with an empty database that I was trying to reference in basic_auth.ini, so basically, I updated this empty database with the content of the basic_auth.db database that MLFlow creates when you run authentication locally, and then… it worked?
I do agree with @kbumsik, it would be easier to just pass the database URI to --app-name as we do with --backend-store-uri for example.
@mlflow/mlflow-team can mlflow team please help check this ?
I debugged the code and found that the problem was in the step of creating the table, so I created it manually in mysql according to the table created by sqlite, and then started my mlflow service smoothly.
@wwwwf Got it. Can you run
mlflow serverwith--workers=1?