opentelemetry-go: Add() is not thread-safe when called with more than one attribute

Description

.Add() is not thread-safe when used with more than one attribute

Environment

  • OS: OS X
  • Architecture: M1
  • Go Version: 1.19
  • opentelemetry-go version: 1.14.0

Steps To Reproduce

  1. Create a Counter
  2. Call .Add(() with 2 attributes, from multiple Go routines.
  3. See error

Expected behavior

It should work without errors.

Here’s the race detector stack race:

==================
WARNING: DATA RACE
Read at 0x00c000591100 by goroutine 654:
  go.opentelemetry.io/otel/attribute.(*Sortable).Less()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:423 +0xb4
  sort.insertionSort()
      /opt/homebrew/Cellar/go/1.19.3/libexec/src/sort/zsortinterface.go:12 +0xd8
  sort.stable()
      /opt/homebrew/Cellar/go/1.19.3/libexec/src/sort/zsortinterface.go:343 +0x70
  sort.Stable()
      /opt/homebrew/Cellar/go/1.19.3/libexec/src/sort/sort.go:208 +0x44
  go.opentelemetry.io/otel/attribute.NewSetWithSortableFiltered()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:257 +0x90
  go.opentelemetry.io/otel/attribute.NewSet()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:194 +0xa4
  go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).aggregate()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:198 +0xdc
  go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:186 +0xa0
  go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
      <autogenerated>:1 +0x68
  aacb/common/metrics.(*counter).Add()
      /Users/abcd/prg/aacb/common/metrics/counter.go:43 +0xa8
  ...
  (REDACTED)

Previous write at 0x00c000591100 by goroutine 651:
  go.opentelemetry.io/otel/attribute.NewSetWithSortableFiltered()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:275 +0x2a8
  go.opentelemetry.io/otel/attribute.NewSet()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:194 +0xa4
  go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).aggregate()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:198 +0xdc
  go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
      /Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:186 +0xa0
  go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
      <autogenerated>:1 +0x68
  aacb/common/metrics.(*counter).Add()
      /Users/abcd/prg/aacb/common/metrics/counter.go:43 +0xa8
  ...
  (REDACTED)

==================

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 39 (34 by maintainers)

Most upvoted comments

This specification issue is looking to add an additional argument to Record and possibly Add. I had considered migrating the attribute parameter to our options pattern (i.e. WithAttributes(/*...*/)), but originally thought it would be overkill for the methods as I didn’t expect any other options to be added.

That assumption appears to have been incorrect. I’ll plan to look into a proposal that will take that approach.

Another suggestion from the SIG meeting today (cc @MadVikingGod):

Have methods that accept no attributes and an equivalent *WithAttributes method to accept the attributes as an attribute.Set. For example the following methods would be used:

type SyncAdder interface {
	Add(ctx context.Context, value N)
	AddWithAttributes(ctx context.Context, value N, attrs attribute.Set)
}

type SyncRecorder interface {
	Record(ctx context.Context, value N)
	RecordWithAttributes(ctx context.Context, value N, attrs attribute.Set)
}

type AsyncObserver interface {
	Observe(value N)
	ObserveWithAttributes(value N, attrs attributes.Set)
}

This would help eliminate the boilerplate of calling *attribute.EmptySet() when no attributes are wanted.

It’s not immediately clear if this will comply with the specification though, i.e.:

Add

Increment the Counter by a fixed amount.

This API SHOULD NOT return a value (it MAY return a dummy value if required by certain programming languages or systems, for example null, undefined).

This API MUST accept the following parameter:

[…]

Users can provide attributes to associate with the increment value, but it is up to their discretion. Therefore, this API MUST be structured to accept a variable number of attributes, including none.

If the specification is talking about a logical API, I think having multiple methods to satisfy it is compliant.

@MrAlias Yes please 😃 . Taking an immutable set will also avoid allocating and copying the attributes each time we need to record an event.

So the attributes can be modified “during” the call by a separate goroutine.

I don’t think so. The original caller is still active on the call stack, so if the slice is modified at this point it means the slice was modified by the caller during its own call.

I’m still concerned about an additional, obligatory memory allocation that results from having a single API accept the current attribute.Set. It may be possible to work around some of the limitations by redesigning attribute.Set (and I think we may want to support context-scoped attributes) but as the current code stands, the problem allocation comes from code organized to build an attribute set incrementally, like:

var attrsPool = sync.Pool{
	New: func() any {
		p := new([]attribute.KeyValue)
                 *p = make([]attribute.KeyValue, 0, 16)
                return p
	},
}

func (thing *Thing) instrumentOperation(ctx context.Context, value1, value2, value3 float64, moreAttrs ...attribute.KeyValue) {
  attrs := attrsPool.Get().(*[]attribute.KeyValue)

  // get some attributes from the "scope" (thing.myAttributes...)
  // get some attributes from the context (e.g., from baggage...)
  // get the attributes from the call site, if any (i.e., moreAttrs...) 
  // the slice of attributes is assembled
  instrument1.Add(ctx, value1, *attrs...)
  instrument2.Add(ctx, value2, *attrs...)
  instrument3.Record(ctx, value3, *attrs...)
  attrsPool.Put(attrs)
}

The user in this case already has organized their code to produce a list of attributes, as typical of a statsd library. If we replace the three instrument operations w/ a shared attribute.Set, that is an improvement in a sense, because if the SDK will compute the attribute.Set each time then letting the user compute it once is an improvement.

In the case of the Lightstep metrics SDK, I mentioned the use of a map[uint64]*record which avoids constructing the attribute.Set if the entry already exists, as determined by a fingerprint function. That is, there are no allocations inside the SDK, whereas attribute.Set would be a new allocation if it were to be required. The LS metrics SDK will compute the fingerprint 3 times, but it would still not allocate. (Note that Prometheus server and its Go client library use a similar organization, a map of fingerprint referring to a list of entries.)

This leads me to think that we should find a way to eliminate attribute.Set in favor of something better, call it the AttributeSetBuilder to be concrete. I don’t have a prototype to show, but I wonder if it’s possible to take the best of both cases, something that allows an incrementally-built list of KeyValue to be reported, so that the attribute.Set is only computed if necessary (once, lazily) and that a fingerprint is also computed only once.

I guess this leads me to agree that it would be nice for high-performance applications to have an option to amortize the construction cost of attribute sets; it’s also possible for SDKs to avoid the attribute.Set allocation itself. Can we have the best of both?

The concern I have is less about the boilerplate, and more that we want to encourage people to use a particular empty set (*attribute.EmptySet()) and how that is different from the norms of go. If nil or attributes.Set{} were our recommendation then there really isn’t a need for this extra api.

After #3957 they will be able to pass attribute.Set{} and they can currently just use attribute.NewSet() instead of *attribute.EmptySet().

We could comment attribute.EmptySet to say to use it if you need a pointer (though new(attribute.Set) will also do).

I think, I would prefer to have a single method. It reduces the API surface which IMO has benefits for both users of the API and for the developers of API implementations. Users who do not like boilerplate can always create their own helpers/adapters.

The concern I have is less about the boilerplate, and more that we want to encourage people to use a particular empty set (*attribute.EmptySet()) and how that is different from the norms of go. If nil or attributes.Set{} were our recommendation then there really isn’t a need for this extra api.

After https://github.com/open-telemetry/opentelemetry-go/pull/3957 they will be able to pass attribute.Set{} and they can currently just use attribute.NewSet() instead of *attribute.EmptySet().

We could comment attribute.EmptySet to say to use it if you need a pointer (though new(attribute.Set) will also do).

Even if we have a way to pass attribute.Set to the metrics API it will not be as fast as the hypothetical bound instrument API.

Sure, but is it less performant than it is more performant for a user to just re-use a set instead of allocating a new group of attribute args each time? It seems like there is an order of magnitude between the two and accepting a set allows the user to choose how performant they want to be or not without us having to add an additional bound instrument API

If a caller is currently repeating attribute sets but has no good way to cache the result of calculating an attribute.Set, now they’ll require two allocations at every call site, one to construct a temporary list and one to construct a set

I think that’s what’s really happening right now, no?

You have:

myCounter.Add(context.Background(), 1, attribute.String("x", "y"), attribute.Int("z", 2))

The attrs ...attribute.KeyValue argument, gets converted into a []attribute.KeyValue and it needs an allocation. Later, it gets internally converted into a attribute.Set, allocating the set.

@jmacd What if AddWithAttributes(context.Context, value int64, attribute.Set) is Add(context.Context, value int64, attribute.Set), alongside Add(context.Context, value int64, ...attribute.KeyValue)?

In my opinion, the current API is working as designed. Comments on the synchronous instruments should state that the input slice of ...attribute.KeyValue is meant to be the one a compiler would generate using the arguments supplied at the call site, because the SDK will perform the necessary sorting step in place without an allocation.

By the way, this conversation is related to https://github.com/open-telemetry/opentelemetry-collector/issues/7455 where I’ve stated that even calling attribute.NewSet() is too expensive for every synchronous operation.

I think it should be possible to work around the race condition by pre-sorting the list of attributes. For example, I expect attribute.NewSet(input...).ToSlice() to return the correctly sorted, deduplicated attributes and I think it ought to not race. (Does sort.Stable() modify an already-sorted slice? If the answer is yes, probably we should check whether the slice is sorted before sorting it, to avoid this race.

I think it would be appealing to support alternative APIs with both variations, meaning for every Add(context.Context, value int64, ...attribute.KeyValue) there is a less ergonomic form like AddWithAttributes(context.Context, value int64, attribute.Set). This would give the user the choice to have a simple (familiar) API or a more sophisticated one when performance really matters. I also think this is a nice step short of “bound” instruments. I think https://github.com/open-telemetry/opentelemetry-go/pull/3947 is a step too far. By the way, I think the specification supports this direction, with this text:

The [OpenTelemetry API](../overview.md#api) authors MAY decide to allow flexible
[attributes](../common/README.md#attribute) to be passed in as arguments. If
the attribute names and types are provided during the [counter
creation](#counter-creation), the [OpenTelemetry API](../overview.md#api)
authors MAY allow attribute values to be passed in using a more efficient way
(e.g. strong typed struct allocated on the callstack, tuple). The API MUST allow
callers to provide flexible attributes at invocation time rather than having to
register all the possible attribute names during the instrument creation.

Hi,

Can I take this bug?