opentelemetry-python: Default otel reporter timeout of 900s is too long.
Since this timeout can occur when a process is closing/exiting it can mean a significant amount of time when the process is just retrying before it ultimately ends.
For example, it’s easy to misconfigure the target service e.g.:
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed for service: opencensus-collector:55680"
which can cause processes to easily hang at exit (and print nothing by default).
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (16 by maintainers)
Thanks for digging into this @lonewolf3739. Agree we should clarify with spec or at least with other SIG maintainers.
And we should probably have separate tickets for possibly reducing exp-backoff max time and add grace period to exporters on shutdown.
Assigning self to address this and related issues.
900 secs is too high for maximum backoff time. Generally it is either 32 or 64 seconds and sometimes can be little high based on use case. I couldn’t find what is the recommended maximum value by OTLP but I assume it wouldn’t be as high 900 seconds. Probably we can raise an issue on the spec repo.
I agree it is probably a bit too much but reducing it should not be the solution to services freezing at shutdown. At shutdown, exporters/processors should be given a grace-period to shutdown cleanly and if they don’t then the pipeline should force them to shutdown and exit. We should have a different issue for that.