simpletransformers: ClassificationModel: predict() hangs forever in uwsgi worker

Describe the bug

When model.predict is invoked in a uwsgi worker, it never resolves (hangs on the line outputs = model(**inputs) )

To Reproduce Steps to reproduce the behavior:

  • Train a roberta-base model with simpletransformers 0.48.9
  • Run a uwsgi + flask server that loads the model with {"use_multiprocessing": False} before spawning workers, and then runs model.predict() when it receives a request (I used the docker image tiangolo/uwsgi-nginx-flask as a base, and install transformers, pytorch and simpletransformers)
  • Emit a request, it hangs on the line outputs = model(**inputs)
  • However, if model.predict() is called on the same server before the uwsgi workers are spawn (when the server loads, as opposed to when responding to a request), it returns normally with the expected result.
  • Another way for predict() to return normally is to load the model inside each worker, meaning the first request handled by each worker is delayed by the loading of the model.

Desktop (please complete the following information):

  • Docker image with Debian Buster + python 3.8 + flask + nginx + uwsgi
  • transformers version 3.3.1
  • simpletransformers version 0.48.9
  • torch version 1.6.0
  • uwsgi: tested with versions 2.0.17, 2.0.18, 2.0.19, 2.0.19.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (4 by maintainers)

Most upvoted comments

I had the same problem and now I solved it.

My args dict is like below.

args={"use_multiprocessing": False, "use_multiprocessing_for_evaluation": False, "process_count": 1}

Setting use_multiprocessing=False should fix it.

@jmeisele I use uwsgi (wsgi).

To delay the model loading into the worker you can use a singleton:

  • classifier.py (with a very basic lazy singleton):
from simpletransformers.classification import ClassificationModel

model = None

def get_model():
    global model
    if model is None:
        model_args = {"use_multiprocessing": False}
        model = ClassificationModel('roberta', 'model/', args=model_args)
    return model

# get_model()  # If you un-comment this line, the model will be created before the workers are spawned. If you leave it commented, it will be created the first time `predict` is invoked

def predict(text):
    cl_model = get_model()
    predictions, raw_outputs = cl_model.predict([text])
    # here goes your handling of the output
  • In my main.py file, referenced in uwsgi.ini:
from flask import Flask
from classifier import predict 

app = Flask(__name__)

@app.route('/prediction/<text>', methods=['GET'])
def predict_get(text):
    v =  predict(text)

But I am still unsure if this is the proper way to load and use the model.

@ThilinaRajapakse There is an issue in your snippet:

model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)

# ...

outputs = model(**inputs)

If I run that I get TypeError: 'ClassificationModel' object is not callable.

I looked at the code of ClassificationModel.predict and it calls this.model(**inputs) so I instead ran outputs = model.model(**inputs)

from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")


def prediction_test(text):
    """Simple function for Flask with no bells and whistles"""

    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.model(**inputs)

    return outputs

And it still hangs the same way on model.model(**inputs) when the model is loaded before the workers are spawned, and prediction_test is called from a worker.


For now, we’ve updated the server so it loads the model in each worker (last point of my initial message) which means that the first request of a worker after its spawned is always slower. Is that the recommended approach ?