langchainjs: Retry logic for OpenAI timeouts
I’m seeing the following error in prod:
Error [TimeoutError]: Request timed out.
at wrapOpenAIClientError (file:///app/node_modules/langchain/dist/util/openai.js:6:17)
at file:///app/node_modules/langchain/dist/chat_models/openai.js:518:31
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async RetryOperation._fn (/app/node_modules/p-retry/index.js:50:12) {
attemptNumber: 1,
retriesLeft: 6
}
It’s getting captured in my catch block, so I’m fairly sure the retries aren’t happening, unless the first attempt is the one that gets re-thrown or something confusing like that. Is it possible this doesn’t meet the criteria for retryable? Could this be addressed using the FailedAttemptHandler interface?
I’ve not set a timeout for the LLM. I’m having a hard time figuring out the default value.
/**
* Custom handler to handle failed attempts. Takes the originally thrown
* error object as input, and should itself throw an error if the input
* error is not retryable.
*/
onFailedAttempt?: FailedAttemptHandler;
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Reactions: 1
- Comments: 28 (7 by maintainers)
🤖
Hello,
From the error message you’ve shared, it seems like the request to OpenAI is timing out. This could be due to a variety of reasons, such as network latency or the OpenAI server taking longer to respond.
Looking at the langchainjs codebase, it appears that the default timeout for OpenAI requests is set to 1000 milliseconds (or 1 second). If the OpenAI server takes longer than this to respond, a
TimeoutErrorwill be thrown. You might want to consider increasing this timeout value if your network conditions or the server’s response time require it.As for the retry logic, the current implementation in langchainjs does not seem to include any retry mechanism for OpenAI requests. The
embeddingWithRetrymethod, which is responsible for making requests to the OpenAI API, does not contain any retry logic. The OpenAIClient instance is created withmaxRetries: 0, which means no retries will be attempted in case of a failure.If you want to implement a retry mechanism, you could modify the
embeddingWithRetrymethod to include a retry loop, or use a library that provides retry functionality. TheonFailedAttemptoption you mentioned could indeed be used to determine whether an error is retryable or not. You would need to implement this function to check the error and throw it if it’s not retryable, allowing the retry mechanism to catch it and retry the request.Here’s a simple example of how you could implement this:
In this example, the
onFailedAttemptfunction checks if the error is aTimeoutError, and if it’s not, it throws the error, stopping the retry loop. If it is aTimeoutError, it logs the attempt number and the number of retries left, and allows the retry loop to continue.I hope this helps! Let me know if you have any other questions.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I just found out that retries are not happening at all, despite setting maxRetries to a valid value in the langchain ChatOpenAI object.
This sort of bug remaining unsolved for so long makes me doubt whether anyone uses langchainjs at all.
Can the maintainers fix this?
I don’t understand why this is being closed. Seems to me that there is still a clear bug here even if there is a workaround.
@jacoblee93 I understand your point, and it’s valid. In my case, I understood the timeout as the max time it should await for the LLM api to respond, therefore I expected it to retry if it didn’t happen.
Sometimes, for who knows what reason, a request to OpenAI for example might timeout, but the second request be responded almost instantly. So I use to set short timeouts (5 seconds) as waiting for the second retry to succeed is usually much faster than setting a longer timeout and awaiting for the first request to answer.
If it is to stay as it is, I think it should be mentioned in the docs that a timed out request won’t be retried automatically.
Oh, you should also just pass
onFailedAttemptwhere you’d be able to passmaxRetriesand it would work as well. I don’t think you’d need to subclass.If that was their desire won’t they just setup retries to 0?
The PR doesn’t actually fix the bug as I discussed with @jacoblee93. It is still an open issue.
See https://github.com/langchain-ai/langchainjs/pull/3370#discussion_r1402660699
For posterity, this more or less solved my problem, though its probably not the best solution (see below)
Some more details here: https://github.com/openai/openai-node/blob/master/README.md#retries
Maybe this is all you need, to increase maxRetries on the OAI client? The documentation should state clearly that there are two non-overlapping retry strategies, one in OpenAI and one in LangChain, and what each of them controls.
The default failure handler looks like the culprit:
However, reviewing OpenAI’s documentation:
It sounds like this should be retryable to me.