netty: Performance regression introduced in 4.1.35 and later

While upgrading netty version from 4.1.33 to 4.1.39 in our application, we observed performance regression in the newer versions.

After narrowing down, the client performance seems gradually degraded since 4.1.35. We are using JMH to measure throughput for our netty client code and starting from 4.1.37, the throughput has been decreased by around 10%. The degradation occurred for both H1 and H2 code path.

Are you aware of anything between those versions that could cause performance degradation? Are there any performance tests in place to catch regressions?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (9 by maintainers)

Commits related to this issue

Most upvoted comments

@dagnir thanks a lot to confirm this! I will release the next version mid next week.

@normanmaurer @njhill Our numbers are back to baseline with this change (or slightly improved). Thanks for the quick turnaround!

When can we expect to see 4.1.40.Final released to Maven?

@dagnir thanks a lot for the details… This makes a lot of sense. @njhill is working on a fix already 😃

Yes! On my to do list for tomorrow

Am 11.09.2019 um 22:08 schrieb Dongie Agnir notifications@github.com:

Hi @normanmaurer just checking in to see if this fix is still on track to be released this week. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Hi @normanmaurer, after some profiling, it seems this is the change that is affecting us: https://github.com/netty/netty/commit/f17bfd0f64189d91302fbdd15103788bf9eabaa2. Spefically, it seems to be the change to stop using a static instance of CancellationException in DefaultPromise#cancel() that’s taking up extra CPU in our case.

For context, in the SDK we use a new WriteTimeoutHandler that is added to the channel pipeline before sending an HTTP request: https://github.com/aws/aws-sdk-java-v2/blob/79a22b071b381ecbce2b94ce7935ed879971f25f/http-clients/netty-nio-client/src/main/java/software/amazon/awssdk/http/nio/netty/internal/NettyRequestExecutor.java#L187-L194. Since this handler cancels the pending task on a successful write (which is pretty much 100% in our benchmark code), a lot of time gets taken up creating the exceptions.

Added an SS of the flame graph for a profiling capture I did. netty-4-1-39-cancellationexception