opentelemetry-go: Add() is not thread-safe when called with more than one attribute
Description
.Add()
is not thread-safe when used with more than one attribute
Environment
- OS: OS X
- Architecture: M1
- Go Version: 1.19
- opentelemetry-go version: 1.14.0
Steps To Reproduce
- Create a Counter
- Call
.Add(()
with 2 attributes, from multiple Go routines. - See error
Expected behavior
It should work without errors.
Here’s the race detector stack race:
==================
WARNING: DATA RACE
Read at 0x00c000591100 by goroutine 654:
go.opentelemetry.io/otel/attribute.(*Sortable).Less()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:423 +0xb4
sort.insertionSort()
/opt/homebrew/Cellar/go/1.19.3/libexec/src/sort/zsortinterface.go:12 +0xd8
sort.stable()
/opt/homebrew/Cellar/go/1.19.3/libexec/src/sort/zsortinterface.go:343 +0x70
sort.Stable()
/opt/homebrew/Cellar/go/1.19.3/libexec/src/sort/sort.go:208 +0x44
go.opentelemetry.io/otel/attribute.NewSetWithSortableFiltered()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:257 +0x90
go.opentelemetry.io/otel/attribute.NewSet()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:194 +0xa4
go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).aggregate()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:198 +0xdc
go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:186 +0xa0
go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
<autogenerated>:1 +0x68
aacb/common/metrics.(*counter).Add()
/Users/abcd/prg/aacb/common/metrics/counter.go:43 +0xa8
...
(REDACTED)
Previous write at 0x00c000591100 by goroutine 651:
go.opentelemetry.io/otel/attribute.NewSetWithSortableFiltered()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:275 +0x2a8
go.opentelemetry.io/otel/attribute.NewSet()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel@v1.14.0/attribute/set.go:194 +0xa4
go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).aggregate()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:198 +0xdc
go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
/Users/abcd/go/pkg/mod/go.opentelemetry.io/otel/sdk/metric@v0.37.0/instrument.go:186 +0xa0
go.opentelemetry.io/otel/sdk/metric.(*instrumentImpl[...]).Add()
<autogenerated>:1 +0x68
aacb/common/metrics.(*counter).Add()
/Users/abcd/prg/aacb/common/metrics/counter.go:43 +0xa8
...
(REDACTED)
==================
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 39 (34 by maintainers)
This specification issue is looking to add an additional argument to
Record
and possiblyAdd
. I had considered migrating the attribute parameter to our options pattern (i.e.WithAttributes(/*...*/)
), but originally thought it would be overkill for the methods as I didn’t expect any other options to be added.That assumption appears to have been incorrect. I’ll plan to look into a proposal that will take that approach.
Another suggestion from the SIG meeting today (cc @MadVikingGod):
Have methods that accept no attributes and an equivalent
*WithAttributes
method to accept the attributes as anattribute.Set
. For example the following methods would be used:This would help eliminate the boilerplate of calling
*attribute.EmptySet()
when no attributes are wanted.It’s not immediately clear if this will comply with the specification though, i.e.:
If the specification is talking about a logical API, I think having multiple methods to satisfy it is compliant.
@MrAlias Yes please 😃 . Taking an immutable set will also avoid allocating and copying the attributes each time we need to record an event.
I don’t think so. The original caller is still active on the call stack, so if the slice is modified at this point it means the slice was modified by the caller during its own call.
I’m still concerned about an additional, obligatory memory allocation that results from having a single API accept the current
attribute.Set
. It may be possible to work around some of the limitations by redesigningattribute.Set
(and I think we may want to support context-scoped attributes) but as the current code stands, the problem allocation comes from code organized to build an attribute set incrementally, like:The user in this case already has organized their code to produce a list of attributes, as typical of a statsd library. If we replace the three instrument operations w/ a shared attribute.Set, that is an improvement in a sense, because if the SDK will compute the attribute.Set each time then letting the user compute it once is an improvement.
In the case of the Lightstep metrics SDK, I mentioned the use of a
map[uint64]*record
which avoids constructing theattribute.Set
if the entry already exists, as determined by a fingerprint function. That is, there are no allocations inside the SDK, whereasattribute.Set
would be a new allocation if it were to be required. The LS metrics SDK will compute the fingerprint 3 times, but it would still not allocate. (Note that Prometheus server and its Go client library use a similar organization, a map of fingerprint referring to a list of entries.)This leads me to think that we should find a way to eliminate attribute.Set in favor of something better, call it the
AttributeSetBuilder
to be concrete. I don’t have a prototype to show, but I wonder if it’s possible to take the best of both cases, something that allows an incrementally-built list of KeyValue to be reported, so that the attribute.Set is only computed if necessary (once, lazily) and that a fingerprint is also computed only once.I guess this leads me to agree that it would be nice for high-performance applications to have an option to amortize the construction cost of attribute sets; it’s also possible for SDKs to avoid the attribute.Set allocation itself. Can we have the best of both?
I think, I would prefer to have a single method. It reduces the API surface which IMO has benefits for both users of the API and for the developers of API implementations. Users who do not like boilerplate can always create their own helpers/adapters.
After https://github.com/open-telemetry/opentelemetry-go/pull/3957 they will be able to pass
attribute.Set{}
and they can currently just useattribute.NewSet()
instead of*attribute.EmptySet()
.We could comment
attribute.EmptySet
to say to use it if you need a pointer (thoughnew(attribute.Set)
will also do).Sure, but is it less performant than it is more performant for a user to just re-use a set instead of allocating a new group of attribute args each time? It seems like there is an order of magnitude between the two and accepting a set allows the user to choose how performant they want to be or not without us having to add an additional bound instrument API
I think that’s what’s really happening right now, no?
You have:
The
attrs ...attribute.KeyValue
argument, gets converted into a[]attribute.KeyValue
and it needs an allocation. Later, it gets internally converted into aattribute.Set
, allocating the set.@jmacd What if
AddWithAttributes(context.Context, value int64, attribute.Set)
isAdd(context.Context, value int64, attribute.Set)
, alongsideAdd(context.Context, value int64, ...attribute.KeyValue)
?In my opinion, the current API is working as designed. Comments on the synchronous instruments should state that the input slice of
...attribute.KeyValue
is meant to be the one a compiler would generate using the arguments supplied at the call site, because the SDK will perform the necessary sorting step in place without an allocation.By the way, this conversation is related to https://github.com/open-telemetry/opentelemetry-collector/issues/7455 where I’ve stated that even calling
attribute.NewSet()
is too expensive for every synchronous operation.I think it should be possible to work around the race condition by pre-sorting the list of attributes. For example, I expect
attribute.NewSet(input...).ToSlice()
to return the correctly sorted, deduplicated attributes and I think it ought to not race. (Doessort.Stable()
modify an already-sorted slice? If the answer is yes, probably we should check whether the slice is sorted before sorting it, to avoid this race.I think it would be appealing to support alternative APIs with both variations, meaning for every
Add(context.Context, value int64, ...attribute.KeyValue)
there is a less ergonomic form likeAddWithAttributes(context.Context, value int64, attribute.Set)
. This would give the user the choice to have a simple (familiar) API or a more sophisticated one when performance really matters. I also think this is a nice step short of “bound” instruments. I think https://github.com/open-telemetry/opentelemetry-go/pull/3947 is a step too far. By the way, I think the specification supports this direction, with this text:Hi,
Can I take this bug?