node-slack-sdk: What can I do about `Retry header did not contain a valid timeout`?

Hey there,

I’ve been running into 3 occurrences of this error: Retry header did not contain a valid timeout (from here)

The client was using the outdated 5.15.0 WebApi, so it didn’t tell me the url.

My main questions are

  • is there a way to prevent getting rate limit responses w/o retrySec information?
  • are there any ideas about what causes this?
  • Is there a best practice on how to handle these?

Thanks in advance

Packages:

Select all that apply:

  • @slack/web-api
  • @slack/rtm-api
  • @slack/webhooks
  • @slack/oauth
  • @slack/socket-mode
  • I don’t know

Reproducible in:

npm ls | grep -o "\S\+@\S\+$" | tr @ ' ' | awk -v q='"' '{print q$1q": "q"^"$2q","}' | grep slack
node --version
sw_vers && uname -v # or `ver`

The Slack SDK version

@slack/web-api: 5.15.0

Python runtime version

v15.14.0

OS info

Chrome/98.0.4758.80 Macintosh; Intel Mac OS X 10_15_7

Steps to reproduce:

(Share the commands to run, source code, and project settings)

  1. Send a slack message
  2. Receive a 429
  3. Miss the timeout

Expected result:

I don’t know, assume a default timeoutSec?

Actual result:

(Tell what actually happened with logs, screenshots)

Requirements

For general questions/issues about Slack API platform or its server-side, could you submit questions at https://my.slack.com/help/requests/new instead. 🙇

Please read the Contributing guidelines and Code of Conduct before creating this issue or pull request. By submitting, you are agreeing to those rules.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 21 (12 by maintainers)

Commits related to this issue

Most upvoted comments

FYI @meichstedt I just published web-api 6.7.0 to npm.

Alright, I’ve dug around, and I believe I can find the relevant logs from the backend for these calls. I see many calls to the conversations.list API with a status=ok log, followed by a single error=rate_limited log. Note that the conversations.list API is a Tier 2 rate limited API, which officially only supports about 20 requests per minute. By my rough calculations, your app is issuing about 10 requests per second. So most definitely your app is bound to hit the rate limit for this API quite quickly 😄

Unfortunately, on our side, while we do log the request parameters and some other helpful tidbits, we do not log the full response returned. As such, I am limited to trying to work backwards through the logic of the backend code to see how this situation could arise. That said, we do log a specific SHA of the backend code as well as a kind of stack trace through the backend code - so I can see the separate paths of logic executed between the rate-limited call and the successful calls.

I will list out my assumptions based on what I’ve seen from the backend code and how the node-slack-sdk parses the headers:

  1. I am 99.9% sure that some kind of value is returned by the backend in the Retry-After HTTP response header - what kind of value, I am not sure and it is hard for me to infer as it comes from a separate “quota” system. There is some float math applied to this value before writing it to the response header as it seems the quota system stores rate limits in nanoseconds whereas the Retry-After header returns it to clients as seconds. So perhaps there is an issue here in the float math followed by casting to an integer.
  2. Looking at the node-slack-sdk’s parseRetryHeaders method, if the header is defined and can be parsed as an integer, then the exception you noted in your original post should never arise. However, if the header is not present or it cannot be parsed as an integer / ends up as a NaN, then the exception you noted is raised.

I think the key to solving this problem lies in identifying what was returned in these rate-limited response headers.

Perhaps as a baby step to helping with this issue, I can improve the exception being raised such that it records the values returned in the HTTP response’s Retry-After header? This way, whenever this situation arises again, at least we can get a clue that would be helpful in determining how this could arise, and could inform our next step after that.

Sorry this is so indirect / taking so long! However, if you have the time and patience to work with me on this issue, I hope we can get to the bottom of it 😃