opentelemetry-python: Application hangs if Otel collector is down
I initialize the tracer like this:
trace.set_tracer_provider(TracerProvider(
resource=Resource.create({SERVICE_NAME: service}),
))
otlp_exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317", insecure=True)
trace.get_tracer_provider().add_span_processor(
SimpleSpanProcessor(otlp_exporter)
)
It seems that if the otel-collector is down, the entire application hangs and r=then crashes once it tries to send a span to the collector
I would expect that maybe it would send a warning or give an error, but it should not hang the entire application.
Is there a way to avoid this?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 25 (11 by maintainers)
@srikanthccv I believe I got it solved. Gunicorn automaically converts blocking code to non-blocking (using monkey-patching) when gevent worker type is used. https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L38
But for it to work properly, monkey-patching must be the first command run before importing any other modules, else the previously imported modules will still run with blocking code. I had imported these opentelemetry modules in gunicorn.py config file, but did not use them anywhere. https://github.com/rdpravin1895/opentelemetry-django-test/blob/master/NGANROCIAPI/gunicorn.py#L8-L14
After removing these imports, I dont see the blocking anymore.
@gionapaolini please close the issue if it solved your problem. Let us know if you have any other questions.
(sorry mistakenly closed the issue prematurely) Thanks @lonewolf3739 it makes sense, will try and close the issue.
Ah sorry I didn’t notice the example. You shouldn’t be using
SimpleSpanProcessor
for real world production purpose since it is by design blocking. You need to useBatchSpanProcessor
.For now I solved the issue by using a jaeger agent and jaeger exporter:
Although it gives an instant error, so it probably does not retry at all. But for my use-case, I still prefer it in this way