netty: Performance regression introduced in 4.1.35 and later
While upgrading netty version from 4.1.33 to 4.1.39 in our application, we observed performance regression in the newer versions.
After narrowing down, the client performance seems gradually degraded since 4.1.35. We are using JMH to measure throughput for our netty client code and starting from 4.1.37, the throughput has been decreased by around 10%. The degradation occurred for both H1 and H2 code path.
Are you aware of anything between those versions that could cause performance degradation? Are there any performance tests in place to catch regressions?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (9 by maintainers)
Commits related to this issue
- Avoid CancellationException construction in DefaultPromise Motivation #9152 reverted some static exception reuse optimizations due to the problem with Throwable#addSuppressed() raised in #9151. This... — committed to njhill/netty by njhill 5 years ago
- Avoid CancellationException construction in DefaultPromise (#9534) Motivation #9152 reverted some static exception reuse optimizations due to the problem with Throwable#addSuppressed() raised in ... — committed to netty/netty by njhill 5 years ago
- Avoid CancellationException construction in DefaultPromise (#9534) Motivation problem with Throwable#addSuppressed() raised in #9151. This introduced a performance issue when promises are cancelled ... — committed to netty/netty by njhill 5 years ago
@dagnir thanks a lot to confirm this! I will release the next version mid next week.
@normanmaurer @njhill Our numbers are back to baseline with this change (or slightly improved). Thanks for the quick turnaround!
When can we expect to see 4.1.40.Final released to Maven?
@dagnir thanks a lot for the details… This makes a lot of sense. @njhill is working on a fix already 😃
Yes! On my to do list for tomorrow
Hi @normanmaurer, after some profiling, it seems this is the change that is affecting us: https://github.com/netty/netty/commit/f17bfd0f64189d91302fbdd15103788bf9eabaa2. Spefically, it seems to be the change to stop using a static instance of
CancellationExceptioninDefaultPromise#cancel()that’s taking up extra CPU in our case.For context, in the SDK we use a new
WriteTimeoutHandlerthat is added to the channel pipeline before sending an HTTP request: https://github.com/aws/aws-sdk-java-v2/blob/79a22b071b381ecbce2b94ce7935ed879971f25f/http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/NettyRequestExecutor.java#L187-L194. Since this handlercancels the pending task on a successful write (which is pretty much 100% in our benchmark code), a lot of time gets taken up creating the exceptions.Added an SS of the flame graph for a profiling capture I did.