opentelemetry-go: Resource merge semantics make upgrades problematic

Description

We’ve encountered issues upon a recent upgrade to OTEL v1.1.0 (which upgraded to semconv v1.7.0) and existing code or contrib code which still used semconv v1.4.0. See https://github.com/open-telemetry/opentelemetry-go-contrib/issues/1384 for an example.

The spec (https://github.com/open-telemetry/opentelemetry-specification/blob/bad49c714a62da5493f2d1d9bafd7ebe8c8ce7eb/specification/resource/sdk.md#merge) seems a bit contradictory in how this should be handled:

The interface MUST provide a way for an old resource and an updating resource to be merged into a new resource.

while also mentioning the merge algorithm (which is implemented to the spec in opentelemetry-go) as:

Else this is a merging error (this is the case when the Schema URL of the old and updating resources are not empty and are different).

From a consumer perspective, there is no real way to determine if all of the resources are compatible prior to calling resource.New(…), and thus we’re failing to create a resource detector.

Instead of the current algorithm, it feels like if the two resource schemas being merged are compatible (same major version), the merge should be able to proceed. Otherwise, we’ll need to coordinate an upgrade of every telemetry package together to ensure we’re on a consistent version of semconv.

Possible algorithms:

  • Create merged resource using the oldest version of semconv. This might introduce new attributes which are introduced with later specs, however it should still be compatible with an older version.
  • Create merged resource using the newest version of semconv (perhaps with migration of certain well known attributes - i.e. rpc.jsonrpc.method -> rpc.method).
  • Create unversioned resource if the versions don’t match.

Environment

  • OS: Linux/Mac
  • Architecture: x86_64
  • Go Version: 1.16+
  • opentelemetry-go version: v1.1.0

Steps To Reproduce

  1. Configure a resource with a combination of resource detectors which use different versions of semconv package (i.e. v1.4.0 + v1.7.0).
  2. resource.New fails because two different semconv versions are in use: https://github.com/open-telemetry/opentelemetry-go/blob/7ce58f355851d0412e45ceb79d977bc612701b3f/sdk/resource/resource.go#L190-L192

Expected behavior

Resource detector should be able to be initialized.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 11
  • Comments: 23 (13 by maintainers)

Commits related to this issue

Most upvoted comments

We hit this issue multiple times already because of routine package updates and without any changes from our side. The main problem is that often the issue can only be noticed in production, and it feels like the impact is not justified by the severity of the problem we’re trying to solve using schema versions. I mean, should we really care if name of some field (which may never be really looked at) was changed and now we temporarily have duplicated field? Especially considering, that often resources define mutually exclusive set of fields.

While I can imagine the cases where it is important to enforce compatible schemas, maybe we can just introduce a flag to ignore schema version incompatibility?

@MrAlias How was this resolved? I got bit by this today.

Something I haven’t seen in the discussion currently is that semconv doesn’t seem to follow semver. The way these are versioned is that not all versions are compatible, even across minor/patch releases 😕 For example, I need to upgrade my code from v1.7.0 to v1.21.0, but both are under v1? Forcing me to do this to continue having working code seems like a major version bump?

@tonglil, semantic conventions versioning policy is defined here: https://opentelemetry.io/docs/specs/otel/versioning-and-stability/#semantic-conventions-stability. Take notice that v1.7.0 and v1.21.0 are different packages. This not “usual” SemVer.

Excepting that the current strategy adopted in the related opentelemetry-go-contrib packages fails to ensure that each detector/instrumentation is separately implemented and available for each semconv – instead it afixes only a single semconv per release.

https://github.com/open-telemetry/opentelemetry-go-contrib/issues/3657#issuecomment-1765527620

What’s more challenging still is that within that single package for a given release, there are currently three different semconvs in use – gcp even has differing semconv within the same ‘namespace’.

While it is not necessarily one to discuss here, the problems with interoperability due to brittle design decisions – particularly with distributed systems – is rather concerning to me as a consumer of the standard, especially as it is currently adopted in Go.

Right now I’m seeing version bumps due to security related issues for the contrib packages. The problem is that they don’t abide by the compatibility contract of golang modules. Arguably the semconv strategy and contract of otel conflicts with golang’s own module contract to the extent that IMHO each semconv alteration should see a similar vN release of a module in the opentelemetry-go-contrib space. That is, each otel semconv has its own package for each detector/instrumentation/instrgen etc module.

@pellared sorry to follow up but I’m looking for clarity on a couple of things.

  1. as casual consumers of otel, it’s a bit unreasonable to expect them to “know” that it follows a different meaning of semver.

  2. I understand they are different packages, but how come I have to change packages if I haven’t changed my code other than a dependency? If my app uses v1.7.0 to implement a schemaless detector (to set some attributes from the environment), shouldn’t it continue working even if the cloud provider detector (detecting other attributes) was upgraded to v1.21.0?

  3. It feels like resource.New should be versioned with semconv if it’s reliant on all detectors’ semconv versions matching.

Telemetry schema and their description of changes from one version to the next are documented in the specification. The gap lies in the resource specification including a requirement for schema URL support and stating that merging resources with different schema URLs is an error without including a requirement that schema translation be applied or that there be a mechanism for indicating that schema translation should be applied.

There was significant discussion of this topic at the maintainers’ meeting today and I expect there will be more discussion of this issue at the specification SIG meeting tomorrow. I’m sorry if my earlier comment appeared to imply that I did not care about this issue or that we were not intending to do anything about it. We certainly do intend to ensure that resources with associated telemetry schema can be used easily and safely. I think that the proposal @MrAlias has made in #3944 is good, and the implementation is the logical conclusion of work started almost two years ago when we began introducing telemetry schema. However, I think it does not go far enough and the spec should instead direct that telemetry schema migration SHOULD be used to align attributes on resources to be merged, either implicitly at the newer of the two versions or explicitly at the version provided by the user. We do, however, need to be careful to ensure that we do not introduce changes to our stable SDK that are incompatible with the specification or that get ahead of specification language under consideration that may change before it is stabilized.

Closing as this is now addressed with the OTel schema.

There are plans for the collector to have a processor with the same logic and the ability to normalize all incoming attributes to a given schema version.

This is a problem we’re in the process of addressing. The semantic conventions now include a schema that describes changes from version to version in a way that will enable migrating a resource from one version to the next. We recently landed https://github.com/open-telemetry/opentelemetry-go/pull/2267 which gives us the ability to parse those schema definitions. The next step is to implement the transformations required and then to update the Resource.Merge() method to migrate resources with differing schemas to a common schema.

maybe we can just introduce a flag to ignore schema version incompatibility?

It would be not compliant with the specification.

I created #4484 that should help to mitigate the issue.

Note for those following here: https://github.com/open-telemetry/opentelemetry-go/pull/4484#issuecomment-1710539976

SIG meeting: This API will not help a lot. Closing.

@MrAlias Will try to revive the translation layer.

maybe we can just introduce a flag to ignore schema version incompatibility?

It would be not compliant with the specification.

I created https://github.com/open-telemetry/opentelemetry-go/pull/4484 that should help to mitigate the issue.

Here’s a new repo I created to handle schema translations: https://github.com/MrAlias/otel-schema-utils

Only thing supported currently is resource translations.

I’m not sure there is anything we can do in the Go implementation at this time to improve the situation. The specification does not detail how to perform resource merging with multiple schema URLs involved, instead it defines this as an error and states that the resulting resource is undefined. There is a proposal to separate semantic conventions from the specification, which may result in a new schema family and versions, which I think would only further complicate this situation.

While I think it might be argued that it would be spec compliant to return an error and a merged resource with the schema URL removed, doing so would likely condition users to ignore the error and result in surprising behavior when the implementation changes and/or different behavior is specified.