runtime: HTTP/2: Rare data corruption when stress testing against Kestrel
I’ve discovered a data corruption issue when running the HttpStress suite over long stretches of time. The occurrences are very rare (I’ve recorded 8 instances in stress runs adding to over 60M requests).
The issue occurs in requests where the server echoes back random data sent by the client, either headers or content. The final response always differs from expected by a single character: in most cases it’s been corrupted to a different value but I’ve also recorded a couple of instances where it’s missing altogether (as such the content length differs from the expected).
Examples
For example, one operation was expecting
"Bo4JwKXM5KukFjgxQ7E8AV52yeoddJr7XHtKwxu9mVUpl8VbswO3XX9fi2LHWtjlTnCxqM0TD5Lf1FGq9Clguna07DbucdcsliMyvXlVwvqRBUeSlU7J82qfDBpck5qZSTtJtIubQFjzx8pdZ898VJkjrgkjh8kr5GziaKZSZVssIri5MC3cwi49iARCXBtHVsug872EZgY0QsQzyfNV58vNCqzQtMuObaMPx1xEAp2F93jqjREsnOCMxXUI8bLCl9OpV02wi8KG5fD878gR72bDO0SPXe6ZTpT0ftLjPIAf4vHIJLBzuu5jYBQVYcP31lmFCDSDnx"
but got
"Bo4JwKXM5KukFjgxQ7E8AV(2yeoddJr7XHtKwxu9mVUpl8VbswO3XX9fi2LHWtjlTnCxqM0TD5Lf1FGq9Clguna07DbucdcsliMyvXlVwvqRBUeSlU7J82qfDBpck5qZSTtJtIubQFjzx8pdZ898VJkjrgkjh8kr5GziaKZSZVssIri5MC3cwi49iARCXBtHVsug872EZgY0QsQzyfNV58vNCqzQtMuObaMPx1xEAp2F93jqjREsnOCMxXUI8bLCl9OpV02wi8KG5fD878gR72bDO0SPXe6ZTpT0ftLjPIAf4vHIJLBzuu5jYBQVYcP31lmFCDSDnx"
The strings are identical with the exception of character in position 22, where the returned value was (
instead of 5
.
The issue overwhelmingly impacted the POST Duplex Slow
operation, which echoes content flushing characters one by one. However today I recorded a single failure impacting the GET Headers
operation, which echoes randomized headers:
Unexpected values for header Header-51. Expected
"%2c%3d%7e%25%10%10PuF%07%3cXO)%3cz%16%40fQ5%0dD(%22", "%5b%3aQx", "%185%1f%1d%3e%17", "rRV+"
but got
"%2c%3d%7e%25%10%10PuF%07%3cXO)%3cz%16%40fQ5%0dD(%22", "%5b%3aQx", "%185%1f%1d%3e%|7", "rRV+"
In this case the second-to-last character in the third value has been corrupted. If caused by the same bug, this suggests that it might not be triggered because of DATA frame granularity.
Here’s a list of all corrupted characters if somebody can deduce a pattern:
Index | Expected | Actual |
---|---|---|
22 | 5 |
( |
231 | j |
b |
0 | 3 |
l |
243 | o |
i |
408 | q |
U+263c |
70 | H |
missing |
327 | 6 |
missing |
N/A (header value) | 1 |
| |
More details
- Issue manifests itself without cancellation happening.
- Only reproduced when targeting Kestrel, no occurences when using http.sys.
- Data corruption impacts both request bytes and response bytes.
- Issue manifests itself is POST duplex, GET and PUT operations.
cc @geoffkizer
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (15 by maintainers)
Can we enable WinHttpHandler in the stress test, and see if it repros against kestrel?
I don’t think we should move it, we were still seeing checksum issues when sending data from the client to the server. I’ll file an issue in the AspNetCore repo.