transformers: ModelError while deploying FlanT5-xl

System Info

transformers_version==4.17.0 Plaform = Sagemaker Notebook python==3.9.0

Who can help?

@ArthurZucker @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Amazon Sagemaker deployment script in AWS for flant5-xl

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'google/flan-t5-xl',
	'HF_TASK':'text2text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "The answer to the universe is"
})

Results in

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
/tmp/ipykernel_20116/1338286066.py in <cell line: 26>()
     24 )
     25 
---> 26 predictor.predict({
     27         'inputs': "The answer to the universe is"
     28 })

~/anaconda3/envs/python3/lib/python3.10/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
    159             data, initial_args, target_model, target_variant, inference_id
    160         )
--> 161         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    162         return self._handle_response(response)
    163 

~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    528                 )
    529             # The "self" in this scope is referring to the BaseClient.
--> 530             return self._make_api_call(operation_name, kwargs)
    531 
    532         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    958             error_code = parsed_response.get("Error", {}).get("Code")
    959             error_class = self.exceptions.from_code(error_code)
--> 960             raise error_class(parsed_response, operation_name)
    961         else:
    962             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Could not load model /.sagemaker/mms/models/google__flan-t5-xl with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.t5.modeling_t5.T5ForConditionalGeneration\u0027\u003e)."
}
"

From an existing issue, I suspected this might be due to the use of transformers==4.17.0, however, when I use the exact same script to deploy flant5-large model, it works without any issues.

Expected behavior

The model should get deployed on AWS Sagemaker without any issues.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 2
Comments: 18 (7 by maintainers)

Most upvoted comments

@RonLek done: https://www.philschmid.de/deploy-flan-t5-sagemaker

+10

philschmid on Feb 8, 2023

If you check this blog post: https://www.philschmid.de/deploy-t5-11b There is a code snippet on how to do this, for t5-11b https://www.philschmid.de/deploy-t5-11b

import torch
from transformers import AutoModelWithLMHead
from huggingface_hub import HfApi

# load model as float16
model = AutoModelWithLMHead.from_pretrained("t5-11b", torch_dtype=torch.float16, low_cpu_mem_usage=True)
# shard model an push to hub
model.save_pretrained("sharded", max_shard_size="2000MB")

philschmid on Feb 13, 2023

@RonLek i am planning to create an example. I ll post it here once it is ready.

philschmid on Feb 8, 2023

When you provide a model_data key word you also have to include the inference.py and the model weights.

philschmid on Feb 1, 2023

@philschmid what should be the contents of the inference.py in case of the flan-t5-xl model? Can this be an empty file if I don’t intend to change anything from the hub model? There doesn’t seem to be such a file included within the Hugging Face repository.

@valentinboyanov I confirm getting the same as well. From the CW logs it seems that 4.17.0 is un-installed and replaced with the latest version specified in the requirements.txt file.

@younesbelkada if I change it, I’m unable to deploy at all:
    raise ValueError(
ValueError: Unsupported huggingface version: 4.26.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.6.1, 4.10.2, 4.11.0, 4.12.3, 4.17.0, 4.6, 4.10, 4.11, 4.12, 4.17.
This is why I’ve followed the instructions by Heiko Hotz (marshmellow77) in this comment to provide a requirements.txt file that will let me specify dependencies I want to be installed in the container.

RonLek on Feb 8, 2023

@younesbelkada if I change it, I’m unable to deploy at all:

    raise ValueError(
ValueError: Unsupported huggingface version: 4.26.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.6.1, 4.10.2, 4.11.0, 4.12.3, 4.17.0, 4.6, 4.10, 4.11, 4.12, 4.17.

This is why I’ve followed the instructions by Heiko Hotz (marshmellow77) in this comment to provide a requirements.txt file that will let me specify dependencies I want to be installed in the container.

valentinboyanov on Feb 1, 2023

Hi @younesbelkada and @RonLek ! I have the same issue deploying google/flan-t5-xxl on SageMaker.

I’ve tried to update to transformers==4.26.0 by providing code/requirements.txt through s3://sagemaker-eu-north-1-***/model.tar.gz:

# Hub Model configuration. https://huggingface.co/models
hub: dict = {"HF_MODEL_ID": "google/flan-t5-xxl", "HF_TASK": "text2text-generation"}

# Create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version="4.17.0",
    pytorch_version="1.10.2",
    py_version="py38",
    model_data="s3://sagemaker-eu-north-1-***/model.tar.gz",
    env=hub,
    role=role,
)

Observing the AWS logs I can see that transformers==4.26.0 was installed:

This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var `HF_MODEL_ID`
/opt/conda/lib/python3.8/site-packages/huggingface_hub/file_download.py:588: FutureWarning: `cached_download` is the legacy way to download files from the HF hub, please consider upgrading to `hf_hub_download`  warnings.warn(
#015Downloading:   0%\|          \| 0.00/11.0k [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 11.0k/11.0k [00:00<00:00, 5.49MB/s]
#015Downloading:   0%\|          \| 0.00/674 [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 674/674 [00:00<00:00, 663kB/s]
#015Downloading:   0%\|          \| 0.00/2.20k [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 2.20k/2.20k [00:00<00:00, 2.24MB/s]
#015Downloading:   0%\|          \| 0.00/792k [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 792k/792k [00:00<00:00, 43.5MB/s]
#015Downloading:   0%\|          \| 0.00/2.42M [00:00<?, ?B/s]#015Downloading:   0%\|          \| 4.10k/2.42M [00:00<01:04, 37.5kB/s]#015Downloading:   1%\|          \| 28.7k/2.42M [00:00<00:16, 147kB/s] #015Downloading:   4%\|▎         \| 86.0k/2.42M [00:00<00:07, 318kB/s]#015Downloading:   9%\|▊         \| 209k/2.42M [00:00<00:03, 633kB/s] #015Downloading:  18%\|█▊        \| 438k/2.42M [00:00<00:01, 1.16MB/s]#015Downloading:  37%\|███▋      \| 897k/2.42M [00:00<00:00, 2.18MB/s]#015Downloading:  76%\|███████▌  \| 1.83M/2.42M [00:00<00:00, 4.24MB/s]#015Downloading: 100%\|██████████\| 2.42M/2.42M [00:00<00:00, 3.12MB/s]
#015Downloading:   0%\|          \| 0.00/2.54k [00:00<?, ?B/s]#015Downloading: 100%\|██████████\| 2.54k/2.54k [00:00<00:00, 2.62MB/s]
WARNING - Overwriting /.sagemaker/mms/models/google__flan-t5-xxl ...
Collecting transformers==4.26.0  Downloading transformers-4.26.0-py3-none-any.whl (6.3 MB)     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 65.9 MB/s eta 0:00:00
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (2.28.1)
Collecting huggingface-hub<1.0,>=0.11.0  Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 190.3/190.3 kB 46.0 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (1.23.3)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (0.13.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (21.3)
Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (4.64.1)
Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (6.0)
Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (3.8.0)
Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (2022.9.13)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (4.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.0->transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (3.0.9)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (1.26.11)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers==4.26.0->-r /opt/ml/model/code/requirements.txt (line 1)) (2022.9.24)
Installing collected packages: huggingface-hub, transformers  Attempting uninstall: huggingface-hub    Found existing installation: huggingface-hub 0.10.0    Uninstalling huggingface-hub-0.10.0:      Successfully uninstalled huggingface-hub-0.10.0  Attempting uninstall: transformers    Found existing installation: transformers 4.17.0    Uninstalling transformers-4.17.0:      Successfully uninstalled transformers-4.17.0
Successfully installed huggingface-hub-0.12.0 transformers-4.26.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip available: 22.2.2 -> 23.0
[notice] To update, run: pip install --upgrade pip
Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport
2023-02-01T15:46:06,090 [INFO ] main com.amazonaws.ml.mms.ModelServer -
MMS Home: /opt/conda/lib/python3.8/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 3461 M
Python executable: /opt/conda/bin/python3.8
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: null
Metrics dir: null
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2023-02-01T15:46:06,140 [WARN ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-google__flan-t5-xxl
2023-02-01T15:46:06,204 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/google__flan-t5-xxl --model-name google__flan-t5-xxl --preload-model false --tmp-dir /home/model-server/tmp
2023-02-01T15:46:06,205 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2023-02-01T15:46:06,205 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 47
2023-02-01T15:46:06,206 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2023-02-01T15:46:06,206 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.8.10
2023-02-01T15:46:06,206 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model google__flan-t5-xxl loaded.
2023-02-01T15:46:06,210 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-02-01T15:46:06,218 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2023-02-01T15:46:06,218 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2023-02-01T15:46:06,219 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2023-02-01T15:46:06,226 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2023-02-01T15:46:06,278 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2023-02-01T15:46:06,281 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2023-02-01T15:46:06,284 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2023-02-01T15:46:06,290 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2023-02-01T15:46:06,298 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.
2023-02-01T15:46:06,302 [WARN ] pool-3-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2023-02-01T15:46:08,478 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model google__flan-t5-xxl loaded io_fd=3abd6afffe6261f4-0000001d-00000000-084f36d4c5a81b10-639dfd41
2023-02-01T15:46:08,491 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2081
2023-02-01T15:46:08,493 [WARN ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-google__flan-t5-xxl-1
2023-02-01T15:46:08,499 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model google__flan-t5-xxl loaded io_fd=3abd6afffe6261f4-0000001d-00000001-c96df6d4c5a81b10-276a10eb
2023-02-01T15:46:08,500 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2089
2023-02-01T15:46:08,500 [WARN ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-google__flan-t5-xxl-3
2023-02-01T15:46:08,512 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model google__flan-t5-xxl loaded io_fd=3abd6afffe6261f4-0000001d-00000004-12e7f154c5a81b12-fe262c46
2023-02-01T15:46:08,512 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2101
2023-02-01T15:46:08,513 [WARN ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-google__flan-t5-xxl-4
2023-02-01T15:46:08,561 [INFO ] W-9000-google__flan-t5-xxl-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model google__flan-t5-xxl loaded io_fd=3abd6afffe6261f4-0000001d-00000003-6582f154c5a81b12-273338b8
2023-02-01T15:46:08,561 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2150
2023-02-01T15:46:08,561 [WARN ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-google__flan-t5-xxl-2
2023-02-01T15:46:10,450 [INFO ] pool-2-thread-6 ACCESS_LOG - /169.254.178.2:59002 "GET /ping HTTP/1.1" 200 7
2023-02-01T15:46:15,412 [INFO ] pool-2-thread-6 ACCESS_LOG - /169.254.178.2:59002 "GET /ping HTTP/1.1" 200 0
2023-02-01T15:46:20,411 [INFO ] pool-2-thread-6 ACCESS_LOG - /169.254.178.2:59002 "GET /ping HTTP/1.1" 200 0

But I got the same error when trying to do an inference:

botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Could not load model /.sagemaker/mms/models/google__flan-t5-xxl with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.t5.modeling_t5.T5ForConditionalGeneration\u0027\u003e)."
}

AWS logs:

2023-02-01T15:49:59,831 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error
2023-02-01T15:49:59,832 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2023-02-01T15:49:59,832 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 219, in handle
2023-02-01T15:49:59,832 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.initialize(context)
2023-02-01T15:49:59,832 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 77, in initialize
2023-02-01T15:49:59,832 [INFO ] W-9000-google__flan-t5-xxl com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1
2023-02-01T15:49:59,833 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.model = self.load(self.model_dir)
2023-02-01T15:49:59,833 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 104, in load
2023-02-01T15:49:59,833 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     hf_pipeline = get_pipeline(task=os.environ["HF_TASK"], model_dir=model_dir, device=self.device)
2023-02-01T15:49:59,833 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 272, in get_pipeline
2023-02-01T15:49:59,833 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     hf_pipeline = pipeline(task=task, model=model_dir, device=device, **kwargs)
2023-02-01T15:49:59,834 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 754, in pipeline
2023-02-01T15:49:59,834 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     framework, model = infer_framework_load_model(
2023-02-01T15:49:59,834 [INFO ] W-9000-google__flan-t5-xxl ACCESS_LOG - /169.254.178.2:59002 "POST /invocations HTTP/1.1" 400 13
2023-02-01T15:49:59,834 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py", line 266, in infer_framework_load_model
2023-02-01T15:49:59,834 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
2023-02-01T15:49:59,835 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ValueError: Could not load model /.sagemaker/mms/models/google__flan-t5-xxl with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>).
2023-02-01T15:49:59,835 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -
2023-02-01T15:49:59,835 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred:
2023-02-01T15:49:59,835 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -
2023-02-01T15:49:59,836 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2023-02-01T15:49:59,836 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/mms/service.py", line 108, in predict
2023-02-01T15:49:59,836 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     ret = self._entry_point(input_batch, self.context)
2023-02-01T15:49:59,836 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle
2023-02-01T15:49:59,836 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise PredictionException(str(e), 400)
2023-02-01T15:49:59,837 [INFO ] W-google__flan-t5-xxl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: Could not load model /.sagemaker/mms/models/google__flan-t5-xxl with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>). : 400

valentinboyanov on Feb 1, 2023