pytriton: Error deploying model on Vertex AI
Description
Hi! I’m trying to deploy a StableDiffusion model in GCP Vertex AI using Pytriton backend. My code works on a local machine, and I’ve been able to send requests and receive inference responses.
My problem arrives when I’m trying to create an endpoint using Vertex AI. Server run fails with error:
WARNING - pytriton.server.triton_server: Triton Inference Server exited with failure. Please wait.
And then:
failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified
...
raise PyTritonClientTimeoutError("Waiting for server to be ready timed out.")
Don’t know if the error with Vertex AI service is due to the server first crashing or vice-versa.
To reproduce
Attaching my server code
# server
import torch
from model import ModelWrapper
from pytriton.decorators import batch
from pytriton.model_config import DynamicBatcher, ModelConfig, Tensor
from pytriton.triton import Triton, TritonConfig, Triton
from urllib.parse import urlparse
class _InferFuncWrapper:
"""
Class wrapper of inference func for triton. Used to also store the model variable
"""
def __init__(self, model: torch.nn.Module):
self._model = model
@batch
def __call__(self, **inputs) -> np.ndarray:
"""
Main inference function for triton backend. Called after batch inference.
Performs all the logic of decoding inputs, calling the model and returning
outputs.
Args:
prompts: Batch of strings with the user prompts
init_images: Batch of initial image to run the diffusion
Returns
image: Batch of generated images
"""
(prompts, init_images) = inputs.values()
# decode prompts and images
prompts = [np.char.decode(p.astype("bytes"), "utf-8").item() for p in prompts]
init_images = [
np.char.decode(enc_img.astype("bytes"), "utf-8").item()
for enc_img in init_images
]
init_images = [_decode_img(enc_img) for enc_img in init_images]
# transform image arrays to tensors and adjust dims to torch usage
images_tensors = torch.tensor(init_images, dtype=torch.float32).permute(
0, 3, 1, 2
)
LOGGER.debug(f"Prompts: {prompts}")
LOGGER.debug(f"{len(init_images)} images size: {init_images[0].shape}")
LOGGER.info("Generating images...")
# call diffusion model
outputs = self._model.run(prompts, images_tensors)
LOGGER.debug(f"Prepared batch response of size: {len(outputs)}")
return {"image": np.array(outputs)}
def _parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Enable verbose logging in debug mode.",
default=True,
)
parser.add_argument(
"--vertex",
"-s",
action="store_true",
help="Enable copying model files from storage for vertex deployment",
default=False,
)
return parser.parse_args()
def main():
"""Initialize server with model."""
args = _parse_args()
# initialize logging
log_level = logging.DEBUG if args.verbose else logging.INFO
logging.basicConfig(
level=log_level, format="%(asctime)s - %(levelname)s - %(name)s: %(message)s"
)
if args.vertex:
LOGGER.debug("Vertex: Loading pipeline from Vertex Storage")
storage_path = os.environ["AIP_STORAGE_URI"]
else:
LOGGER.debug("Loading pipeline locally")
storage_path = ("") # Path to local files
bucket_name, subdirectory = parse_path(storage_path)
LOGGER.debug(f"Downloading files... Started at: {time.strftime('%X')}")
download_blob(bucket_name, subdirectory)
LOGGER.debug(f"Files downloaded! Finished at: {time.strftime('%X')}")
folder_path = os.path.join("src", subdirectory)
LOGGER.debug(f"Running on device: {DEVICE}, dtype: {DTYPE}, triton_port:{PORT}")
LOGGER.info("Loading pipeline...")
model = ModelWrapper(logger=LOGGER, folder_path=folder_path)
LOGGER.info("Pipeline loaded!")
log_verbose = 1 if args.verbose else 0
config = TritonConfig(http_port=8015, exit_on_error=True, log_verbose=log_verbose)
with Triton(config=config) as triton:
# bind the model with its inference call and configuration
triton.bind(
model_name="StableDiffusion_Img2Img",
infer_func=_InferFuncWrapper(model=model),
inputs=[
Tensor(name="prompt", dtype=np.bytes_, shape=(1,)),
Tensor(name="init_image", dtype=np.bytes_, shape=(1,)),
],
outputs=[
Tensor(name="image", dtype=np.bytes_, shape=(1,)),
],
config=ModelConfig(
max_batch_size=4,
batcher=DynamicBatcher(
max_queue_delay_microseconds=100,
),
),
strict=True,
)
# serve the model for inference
triton.serve()
if __name__ == "__main__":
main()
When creating Vertex endpoint, server predict route is configured to:
/v2/models/StableDiffusion_Img2Img/infer
And server health route is configured to:
/v2/health/live
With Vertex port=8015, same as HTTP port set in model configuration.
Observed results and expected behavior
As stated, server runs on local machine, but fails initializing endpoint in Vertex AI.
During Vertex build, local files are correctly downloaded and model pipeline is loaded, so error is probably in triton.bind() function.
Attaching complete log output:
DEBUG - StableDiffusion_Img2Img.server: Files downloaded! Finished at: 18:28:01
DEBUG - StableDiffusion_Img2Img.server: Running on device: cuda, dtype: torch.float16, triton_port:8015
INFO - StableDiffusion_Img2Img.server: Loading pipeline..
INFO - StableDiffusion_Img2Img.server: Pipeline loaded!
...
2023-11-23 18:29:10,322 - DEBUG - pytriton.triton: Triton Inference Server binaries ready in /root/.cache/pytriton/workspace_y7vpgv3x/tritonserver
2023-11-23 18:29:10,322 - DEBUG - pytriton.utils.distribution: Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
2023-11-23 18:29:10,323 - DEBUG - pytriton.utils.distribution: Obtained nvidia_pytriton.libs path: /usr/local/lib/python3.10/dist-packages/nvidia_pytriton.libs
2023-11-23 18:29:10,323 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8015 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-23 18:29:10,323 - DEBUG - pytriton.client.client: Creating InferenceServerClient for http://127.0.0.1:8015 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
2023-11-23 18:29:10,323 - DEBUG - pytriton.triton: Starting Triton Inference
2023-11-23 18:29:10,324 - DEBUG - pytriton.server.triton_server: Triton Server binary /root/.cache/pytriton/workspace_y7vpgv3x/tritonserver/bin/tritonserver. Environment:
{
...
}
2023-11-23 18:29:10,449 - DEBUG - pytriton.client.utils: Waiting for server to be ready (timeout=119.99996042251587)
2023-11-23 18:29:12,954 - WARNING - pytriton.server.triton_server: Triton Inference Server exited with failure. Please wait
2023-11-23 18:29:12,954 - DEBUG - pytriton.server.triton_server: Triton Inference Server exit code 1
2023-11-23 18:29:12,954 - DEBUG - pytriton.triton: Got callback that tritonserver process finished
2023-11-23 15:31:10.655 Traceback (most recent call last):
2023-11-23 15:31:10.655 File "/home/app/src/server.py", line 200, in <module>
2023-11-23 18:31:10,655 - DEBUG - pytriton.triton: Cleaning model manager, tensor store and workspace.
failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified
2023-11-23 18:31:10,655 - DEBUG - pytriton.utils.workspace: Cleaning workspace dir /root/.cache/pytriton/workspace_y7vpgv3x
raise PyTritonClientTimeoutError("Waiting for server to be ready timed out.")
pytriton.client.exceptions.PyTritonClientTimeoutError: Waiting for server to be ready timed out.
Additional steps taken
From the timeout error raised by Pytriton we’ve tried increasing timeout by setting monitoring_period_s in server.run() to an arbitrary high threshold.
We’ve also tied adapting server configuration to Vertex with:
TritonConfig(http_port=8015, exit_on_error=True, log_verbose=log_verbose, allow_vertex_ai=true, vertex_ai_port=8080)
But getting same error.
Environment
Docker base image: nvcr.io/nvidia/pytorch:23.10-py3
Requierements:
torch @ https://download.pytorch.org/whl/cu116/torch-1.12.1%2Bcu116-cp310-cp310-linux_x86_64.whl
diffusers==0.7.2
transformers==4.21.3
ftfy==6.1.1
importlib-metadata==4.13.0
nvidia-pytriton==0.4.1
Pillow==9.5
google-cloud-storage==2.10.0
Any help is appreciated!!
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 4
- Comments: 16
Thanks @sricke! Let us review that and get back to you.
PyTriton 0.5.2 introduced support for VertexAI. See example for mode details.