mlflow: [BUG]Failed to start the server using MLflow authentication. error: Reason: Worker failed to boot.

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

  • Client: 2.5.0
  • Tracking server: 2.5.0

System information

Describe the problem

I am using a Docker image and updated MLflow to version 2.5.0 within it. When trying to start MLflow with the command ‘mlflow server --app-name basic-auth’, it fail.

Tracking information

System information: Linux #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023
Python version: 3.9.14
MLflow version: 2.5.0
MLflow module location: /usr/local/python-3.9.14/lib/python3.9/site-packages/mlflow/__init__.py
Tracking URI: file:///opt/apps/mlruns
Registry URI: file:///opt/apps/mlruns
MLflow dependencies:
  Flask: 2.3.2
  Jinja2: 3.1.2
  alembic: 1.11.1
  click: 8.1.3
  cloudpickle: 2.2.1
  databricks-cli: 0.17.7
  docker: 6.1.3
  entrypoints: 0.4
  gitpython: 3.1.31
  gunicorn: 20.1.0
  importlib-metadata: 6.6.0
  markdown: 3.4.3
  matplotlib: 3.7.1
  numpy: 1.22.4
  packaging: 23.1
  pandas: 1.4.2
  protobuf: 3.20.1
  pyarrow: 12.0.1
  pytz: 2022.1
  pyyaml: 6.0
  querystring-parser: 1.2.4
  requests: 2.27.1
  scikit-learn: 1.1.1
  scipy: 1.6.1
  sqlalchemy: 2.0.16
  sqlparse: 0.4.4

Code to reproduce issue

docker pull adacotechjp/mlflow
docker run --rm --name mlflow-container -it -p 8001:5000 adacotechjp/mlflow:2.4.0 bash
pip install -U mlflow
mlflow server --app-name basic-auth

Stack trace

[2023-07-25 09:24:52 +0000] [45] [INFO] Starting gunicorn 20.1.0
[2023-07-25 09:24:52 +0000] [45] [INFO] Listening at: http://127.0.0.1:5000 (45)
[2023-07-25 09:24:52 +0000] [45] [INFO] Using worker: sync
[2023-07-25 09:24:52 +0000] [46] [INFO] Booting worker with pid: 46
[2023-07-25 09:24:52 +0000] [47] [INFO] Booting worker with pid: 47
[2023-07-25 09:24:53 +0000] [48] [INFO] Booting worker with pid: 48
[2023-07-25 09:24:53 +0000] [49] [INFO] Booting worker with pid: 49
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 8606fa83a998, initial_migration
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 8606fa83a998, initial_migration
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 8606fa83a998, initial_migration
2023/07/25 09:24:54 WARNING mlflow.server.auth: This feature is still experimental and may change in a future release without warning
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 8606fa83a998, initial_migration
[2023-07-25 09:24:54 +0000] [45] [WARNING] Worker with pid 48 was terminated due to signal 15
[2023-07-25 09:24:54 +0000] [45] [WARNING] Worker with pid 49 was terminated due to signal 15
[2023-07-25 09:24:54 +0000] [45] [WARNING] Worker with pid 47 was terminated due to signal 15
[2023-07-25 09:24:54 +0000] [45] [INFO] Shutting down: Master
[2023-07-25 09:24:54 +0000] [45] [INFO] Reason: Worker failed to boot.
Running the mlflow server failed. Please see the logs above for details.

Other info / logs

None

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/gateway: AI Gateway service, Gateway client APIs, third-party Gateway integrations
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 20 (4 by maintainers)

Most upvoted comments

Hi, I got exactly same issue.

I believe these error are largely rooted from the heterogeneous design of auth db config.

To be specific, why we have to set an .ini file and pointing it using MLFLOW_AUTH_CONFIG_PATH is required, whereas all existing DB related configs, --backend-store-uri and --registry-store-uri are simply passing an URI as arguments?

Just like what --registry-store-uri did, introducing something like --auth-store-uri (and use --backend-store-uri by default) would make it much less error-prone because we can share the same codebase.

https://github.com/mlflow/mlflow/blob/304ca3d2598dee681b311ad6add85c7b51f482fe/mlflow/cli.py#L292-L300

I was having the same problem as @wwwwf with the remote permissions database, but I think I found a solution.

To be fairly honest, I did a couple of things and I’m not sure if everything is required, it’s just what I did and it’s currently working.

I looked at the code and apparently there is an argument called authorization_function, so I added that to basic_auth.ini.

This is my setup:

basic_auth.ini:

[mlflow]
default_permission = READ
database_uri = <postgresql-uri>
admin_username = admin
admin_password = password
authorization_function = mlflow.server.auth:authenticate_request_basic_auth

Dockerfile:

FROM python:3.9-buster

# Set the working directory for the container
WORKDIR /

# Copy files
COPY basic_auth.ini basic_auth.ini
COPY requirements.txt requirements.txt 
COPY server.sh server.sh

# Install requirements
RUN pip install --upgrade pip \
    && pip install -r requirements.txt

EXPOSE 8080

RUN chmod +x server.sh

ENTRYPOINT ["./server.sh"]

I’m using a bash script to run the server.

server.sh:

#!/bin/bash

export MLFLOW_AUTH_CONFIG_PATH=basic_auth.ini

mlflow server \
  --host <my_host> \
  --port <my_port> \
  --backend-store-uri <postgresql-uri> \
  --artifacts-destination <bucket-uri> \
  --app-name basic-auth

And there is another step that I did, not really sure if this makes sense or not, but I was trying a bunch of stuff and then it started working, so it might have to be necessary. With the setup above I was not getting any errors running the container, but the UI wasn’t opening. I was using a postgresql instance on Google Cloud Platform with an empty database that I was trying to reference in basic_auth.ini, so basically, I updated this empty database with the content of the basic_auth.db database that MLFlow creates when you run authentication locally, and then… it worked?

I do agree with @kbumsik, it would be easier to just pass the database URI to --app-name as we do with --backend-store-uri for example.

@mlflow/mlflow-team can mlflow team please help check this ?

I was having the same problem as @wwwwf with the remote permissions database, but I think I found a solution.

To be fairly honest, I did a couple of things and I’m not sure if everything is required, it’s just what I did and it’s currently working.

I looked at the code and apparently there is an argument called authorization_function, so I added that to basic_auth.ini.

This is my setup:

basic_auth.ini:

[mlflow]
default_permission = READ
database_uri = <postgresql-uri>
admin_username = admin
admin_password = password
authorization_function = mlflow.server.auth:authenticate_request_basic_auth

Dockerfile:

FROM python:3.9-buster

# Set the working directory for the container
WORKDIR /

# Copy files
COPY basic_auth.ini basic_auth.ini
COPY requirements.txt requirements.txt 
COPY server.sh server.sh

# Install requirements
RUN pip install --upgrade pip \
    && pip install -r requirements.txt

EXPOSE 8080

RUN chmod +x server.sh

ENTRYPOINT ["./server.sh"]

I’m using a bash script to run the server.

server.sh:

#!/bin/bash

export MLFLOW_AUTH_CONFIG_PATH=basic_auth.ini

mlflow server \
  --host <my_host> \
  --port <my_port> \
  --backend-store-uri <postgresql-uri> \
  --artifacts-destination <bucket-uri> \
  --app-name basic-auth

And there is another step that I did, not really sure if this makes sense or not, but I was trying a bunch of stuff and then it started working, so it might have to be necessary. With the setup above I was not getting any errors running the container, but the UI wasn’t opening. I was using a postgresql instance on Google Cloud Platform with an empty database that I was trying to reference in basic_auth.ini, so basically, I updated this empty database with the content of the basic_auth.db database that MLFlow creates when you run authentication locally, and then… it worked?

I do agree with @kbumsik, it would be easier to just pass the database URI to --app-name as we do with --backend-store-uri for example.

I debugged the code and found that the problem was in the step of creating the table, so I created it manually in mysql according to the table created by sqlite, and then started my mlflow service smoothly.

@wwwwf Got it. Can you run mlflow server with --workers=1?

mlflow server --app-name basic-auth --workers 1