opentelemetry-python: (exporter-jaeger-thrift): Exception in forked environments

Describe your environment OTel packages v1.11.1 and v0.30b1. Python 3.8.11 but Python version should be irrelevant to this problem. Bug spotted on MacOS, but should happen on all systems.

Steps to reproduce This is really hard to reproduce because it depends heavily on timing. It’s probably a race condition among threads.

What is the expected behavior? Not see the exception.

What is the actual behavior? This is the exception:

Traceback (most recent call last):
  File "/.../site-packages/opentelemetry/sdk/trace/export/__init__.py" line 358 in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
  File "/.../site-packages/opentelemetry/exporter/jaeger/thrift/__init__.py" line 219 in export
    self._agent_client.emit(batch)
  File "/.../site-packages/opentelemetry/exporter/jaeger/thrift/send.py" line 70 in emit
    self.client.emitBatch(batch)
  File "/.../site-packages/opentelemetry/exporter/jaeger/thrift/gen/agent/Agent.py" line 61 in emitBatch
    self.send_emitBatch(batch)
  File "/.../site-packages/opentelemetry/exporter/jaeger/thrift/gen/agent/Agent.py" line 64 in send_emitBatch
    self._oprot.writeMessageBegin('emitBatch', TMessageType.ONEWAY, self._seqid)
  File "/.../site-packages/thrift/protocol/TCompactProtocol.py" line 157 in writeMessageBegin
    assert self.state == CLEAR
AssertionError

Additional context None.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

Currently, the BatchSpanProcessor can be deemed as having the feature of supporting forked environments for itself, meaning the thread creation, the api surface, the worker behavior should all work in a forked environment.

@jpmelos

I said that because it is the processor creating the thing that causes the trouble, which is the thread that uses the exporter. If that thread didn’t exist, this problem also probably wouldn’t exist.

What “causes trouble” is when an exporter is used in conjunction that does NOT support being in a forked environment. If an exporter is designed so that it could be used in a forked environment, or if it does not even hold any state (simply does exporting as you’ve mentioned above), this “troubled state” would not happen either. It is not solely due to the thread existing in the batch span processor. This seems to me (as @srikanthccv pointed out) as a responsibility of the exporter. I do not believe it is the responsibility of the span processor to make sure that it’s components (the exporter) supports being in a forked environment, just that itself is.

With that being said, to say that “my OpenTelemetry telemetry pipeline works in a forked environment” is a bold statement that has not been promised. This probably would extend to other custom components as well (if they ever have workers), not just the BatchSpanProcessor. We can provide guidance (similar to what @srikanthccv suggested for hooks) if any of these issues ever come up, to make those components “fork environment-compatible”.

Any other opinions?

it doesn’t matter whether the used span processor is Simple or Batch; the exception will still be raised since the root cause is jaeger-thrift exporter,

@srikanthccv if the issue is not with BatchSpanProcessor, would you mind updating the issue title to reflect the issue better?