runtime: HttpHeaders thread safety and behavior of invalid values
The HttpHeaders
collections were never thread-safe. Accessing a header may force lazy parsing of its value, resulting in modifications to the underlying data structures.
Before .NET 6.0, reading from the collection concurrently happened to be thread-safe in most cases. If the collection contained an invalid header value, that value would be removed during enumeration. In rare cases, this could cause problems from modifying the underlying Dictionary concurrently.
Starting with 6.0, less locking is performed around header parsing as it was no longer needed internally (#54130). Due to this change, multiple examples of users accessing the headers concurrently surfaced. Currently known ones to the team are gRPC’s .NET client (#55898, #55896) and NewRelic’s HttpClient instrumentation (https://github.com/newrelic/newrelic-dotnet-agent/issues/803). Violating thread safety in 6.0 may result in the header values being duplicated/malformed or various exceptions being thrown during enumeration/header accesses.
I am posting a few known call stacks below to help other users hitting this issue discover the root cause:
IndexOutOfRangeException
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.Net.Http.Headers.HttpHeaders.ReadStoreValues[T](Span`1 values, Object storeValue, HttpHeaderParser parser, Int32& currentIndex)
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringOrStringArray(HeaderDescriptor descriptor, Object sourceValues, String& singleValue, String[]& multiValue)
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringArray(HeaderDescriptor descriptor, HeaderStoreItemInfo info)
at System.Net.Http.Headers.HttpHeaders.GetEnumeratorCore()+MoveNext()
NullReferenceException
System.NullReferenceException: Object reference not set to an instance of an object.
at System.Net.Http.Headers.HttpHeaders.ReadStoreValues[T](Span`1 values, Object storeValue, HttpHeaderParser parser, Int32& currentIndex)
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringOrStringArray(HeaderDescriptor descriptor, Object sourceValues, String& singleValue, String[]& multiValue)
at System.Net.Http.Headers.HttpHeaders.GetStoreValuesAsStringArray(HeaderDescriptor descriptor, HeaderStoreItemInfo info)
at System.Net.Http.Headers.HttpHeaders.GetEnumeratorCore()+MoveNext()
InvalidCastException
InvalidCastException: Unable to cast object of type 'System.Collections.Generic.List`1[System.Object]' to type 'System.Net.Http.Headers.MediaTypeHeaderValue'.
at System.Net.Http.Headers.HttpContentHeaders.get_ContentType()
There is precedent in concurrent reads being thread-safe on some collections (namely Dictionary
) and users may assume HttpHeaders
behaves the same.
A related issue here is that forcing the validation of an invalid value will result in its removal from the collection. These Schrödinger’s headers are difficult to reason about as their existence depends on whether/how/when they are observed.
We can look to make the behavior more intuitive. We should investigate whether we can (efficiently) implement the collection in a way that satisfies the following:
- concurrent reads are thread-safe and return the same result to all readers
- concurrent write/modification/remove operations are not supported and their behavior continues to be undefined
- a “validating read” of an invalid value does not result in the removal of the invalid value. The invalid value will still be visible to the
NonValidated
view of the collection
Should enumerating the collection (with or without validation) while a concurrent write operation is in progress have defined behavior?
- We can choose to document that concurrent reads are supported, but not while a write operation is in progress. This matches
Dictionary
’s thread-safety model - Concurrent reads are expected to occur over the response headers where write operations are uncommon
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 5
- Comments: 27 (13 by maintainers)
They have fixed the issue in their instrumentation https://github.com/newrelic/newrelic-dotnet-agent/issues/803 - please try updating their agent to the 9.2.0 version that was released a few hours ago.
I can understand that if it’s only two users, although we’ve had issues on every system running .Net 6.
I don’t know yet if we’ll continue with upgrades. We are in the process of phasing out using a proxy at all so we’ll probably hold further upgrades until then.
I have another exception though that may or may not be related, just to give you some more info.
We were getting TimeoutException on requests to Dynamo DB around 5% of requests when using the proxy. The exception was 100s after the request. We tested without the proxy and didn’t see a single error. These requests were not logged at the proxy at all and there’s nothing else we could explain it with. Other requests, even from the same process, were being logged through the proxy at the same time which is why I thought it is probably a different manifestation of the same issue.
Stack trace
@a-jackson Is the above fix sufficient to unblock your upgrades to 7.0?
The issue will have to be addressed in
main
(currently planned for 7.0) first.Given we’ve only heard of two users hitting this, and that a workaround exists, I think it’s unlikely we would service an eventual fix for this at this time.
cc: @karelz regarding any potential servicing changes here (fix for #65379).
I cannot disable the proxy on the machines the error occurs. But I can see if I can try running the scenario on my dev-machine, which has no proxy.