opentelemetry-python: Application hangs if Otel collector is down

I initialize the tracer like this:

    trace.set_tracer_provider(TracerProvider(
        resource=Resource.create({SERVICE_NAME: service}),
    ))

    otlp_exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317", insecure=True)

    trace.get_tracer_provider().add_span_processor(
        SimpleSpanProcessor(otlp_exporter)
    )

It seems that if the otel-collector is down, the entire application hangs and r=then crashes once it tries to send a span to the collector

I would expect that maybe it would send a warning or give an error, but it should not hang the entire application.

Is there a way to avoid this?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (11 by maintainers)

Most upvoted comments

@srikanthccv I believe I got it solved. Gunicorn automaically converts blocking code to non-blocking (using monkey-patching) when gevent worker type is used. https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L38

But for it to work properly, monkey-patching must be the first command run before importing any other modules, else the previously imported modules will still run with blocking code. I had imported these opentelemetry modules in gunicorn.py config file, but did not use them anywhere. https://github.com/rdpravin1895/opentelemetry-django-test/blob/master/NGANROCIAPI/gunicorn.py#L8-L14

After removing these imports, I dont see the blocking anymore.

@gionapaolini please close the issue if it solved your problem. Let us know if you have any other questions.

(sorry mistakenly closed the issue prematurely) Thanks @lonewolf3739 it makes sense, will try and close the issue.

Ah sorry I didn’t notice the example. You shouldn’t be using SimpleSpanProcessor for real world production purpose since it is by design blocking. You need to use BatchSpanProcessor.

For now I solved the issue by using a jaeger agent and jaeger exporter:

    jaeger_exporter = JaegerExporter(
        agent_host_name='jaeger-agent',
        agent_port=6831,
        collector_endpoint='http://otel-collector:14268/api/traces?format=jaeger.thrift'
    )

Although it gives an instant error, so it probably does not retry at all. But for my use-case, I still prefer it in this way