sagemaker-python-sdk: Increasing the timeout for InvokeEndpoint

The current timeout for InvokeEndpoint is 60 seconds as specified here: https://docs.aws.amazon.com/en_pv/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html

Is there any way we can increase this limit, to say 120 seconds?

***EDIT

Just to be clear, I was able to keep the process on the server running by passing an environment variable in the Model definition like so

 model = MXNetModel(..., env = {'SAGEMAKER_MODEL_SERVER_TIMEOUT' : '300' })

Through CloudWatch, I was able to confirm that the task is still running even after 60 seconds. (For my usecase, I am processing a video frame by frame) My question is however, on the client side I am receiving this kind of error due to the timeout


An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/nightingale-pose-estimation in account 552571371228 for more information.: ModelError

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 30
Comments: 26 (2 by maintainers)

Most upvoted comments

@ajaykarpur Any update on this? Switching to batch transforms doesn‘t seem to be interesting for video input.

hlzl on Nov 13, 2020

I agree 60s limit is quite low and I hope they will slightly bump it.

If you need more than 1 minute (and less than 15) you might be interested in the newest SageMaker offering, namely Asynchronous Inference. The client sends the payload to the endpoint and the result will eventually appear in specified S3 bucket. You can set up SNS or Lambda to inform the client that it is ready to consume (and for example generate S3 presigned URL in the process).

More on that here: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/

tomaszdudek7 on Aug 24, 2021

For disabling retries, you should be able to do something like the following (please note I havent tested this code myself, it serves as a reference):

import boto3
from botocore.config import Config
from sagemaker.session import Session

config = Config(
    read_timeout=80,
    retries={
        'max_attempts': 0
    }
)
sagemaker_runtime_client = boto3.client('sagemaker-runtime', config=config)
sagemaker_client = Session(sagemaker_runtime_client=sagemaker_runtime_client)

See:

In regards to the feature request, one option is to use SageMaker’s batch transform option (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html). May not fit your use case though…

sanjams2 on Dec 12, 2019

I’ve only gotten the invoke timeouts to work after contacting AWS Support and having them configure those. I’ll note they initially didn’t want to do it either, so it’s not quite as simple unfortunately.

No argument you may want to close this ticket, but I’d kindly suggest this is still an improvement that would be a really, really nice to have.

joelachance on Dec 13, 2023

The issue is described pretty well in the original post. I do not have anything to add: https://github.com/aws/sagemaker-python-sdk/issues/1119#issue-521175736

mirodrr2 on Dec 13, 2023

any update?

rafalbog on Aug 17, 2023

Any updates on this?

mirodrr on Jul 26, 2023

Any update ?

bharatTR on Jul 7, 2023

Any updates on this?

iCHAIT on Jun 29, 2022