runtime: WebProxy: IsBypassed(Uri) call is slow under load when bypassOnLocal is true
Description
In WebProxy, the IsBypassed(Uri) call is slow when bypassOnLocal is true under load. This is because the SystemIPGlobalProperties.get_DomainName() is a blocking call due to a lock thereby causing thread contention. This makes IsLocal(Uri) call slow, making IsBypassed(Uri) call slow.
class System.Net.WebProxy:
...
ctor(Uri address, bool bypassOnLocal, string[] bypassList):
// ...
bool IsBypassed(Uri host):
if (host == null)
throw new ArgumentNullException(nameof(host))
return
Address == null ||
(BypassProxyOnLocal && IsLocal(host)) ||
IsMatchInBypassList(host)
bool IsLocal(Uri host):
// some checks, and IPGlobalProperties.GetIPGlobalProperties().DomainName call
// which is slow and blocking under load
bool IsMatchInBypassList(Uri input):
// regex match checks using bypassList
Configuration
- Which version of .NET is the code running on? netcore 3.1
- What OS version, and what distro if applicable? Windows 2016, IIS
- What is the architecture (x64, x86, ARM, ARM64)? x64
- If relevant, what are the specs of the machine?
platform: os : Windows Server 2016 Datacenter kernel_name : Windows kernel_release : 10.0.14393 machine : x86_64
cpu: vendor_id : GenuineIntel model_name : Intel® Xeon® Gold 6132 CPU @ 2.60GHz cpu_cores : 8 cpu_logical_processors : 8 mhz : 2600 family : 6 model : 79 stepping : 0
memory: total : 12.88G
cluster size: 8 nodes
Regression?
DNK
Data
Background: The app that is load-tested is a REST api microservice that calls other REST apis and external network calls are proxied (using the WebProxy class).
- The JMeter load tests was capping at 90/s for the cluster. The times taken by dependency apis did not add up to the total time taken by the transaction. There were large time gaps that were unaccounted for (internal overhead) as the load increased. CPU Usage at ~20% steady (bounded), Memory Available / Usage steady indicating not bound by those resources.
- Load tests resulted in threads added to thread pool for api’s w3wp process. The threads count increased ~2x during the leading 2m of a load run. This pointed to thread pool starvation.
- The endpoint with multiple worker processes (web gardening) perf’ed better at 135/s throughput (wp=2). CPU Usage at 20% * worker processes and steady (bounded). This pointed to thread pool starvation.
JMeter load test result:
Comment Users Duration Nodes # Samples Average Median 90% Line 95% Line 99% Line Min Max Error % Throughput
before 360 10m 8 55084 3863 3752 4505 4862 10443 2064 12122 0.00% 88.79632
Analysis
The PerfView and memory dump traces collected during a load test pointed to the problem. The calls were slow while getting the DNS related information for the domain name in the URL (System.Net.NetworkInformation.SystemIPGlobalProperties.get_DomainName()+24).
[0x7ffd6ba75a84] ntdll!ZwWaitForSingleObject+14
[0x7ffd68404daf] KERNELBASE!WaitForSingleObjectEx+8f
[0x7ffd6749e5ec] dnsapi!TimedLock_Enter+2c
[0x7ffd6749ebd2] dnsapi!NetInfo_Get+2de
[0x7ffd674b78be] dnsapi!Config_GetDnsServerList+52
[0x7ffd674b79d3] dnsapi!Config_GetDnsServerListIp4+3b
[0x7ffd674a9ee7] dnsapi!DnsQueryConfig+21bd7
[0x7ffd6745c006] IPHLPAPI!AllocateAndGetFixedInfo+76
[0x7ffd6745c341] IPHLPAPI!GetNetworkParams+31
[0x7ffcef3ceae1] ILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeLocalAllocHandle, UInt32 ByRef)+a1
[0x7ffcef3ce8ad] System.Net.NetworkInformation.HostInformationPal.GetFixedInfo()+cd
[0x7ffcef3ce7b4] System.Net.NetworkInformation.SystemIPGlobalProperties.get_DomainName()+24
[0x7ffcef3ce698] System.Net.WebProxy.IsLocal(System.Uri)+a8
[0x7ffcee0b4c6a] System.Net.WebProxy.IsBypassed(System.Uri)+3a
[0x7ffcee0b4a8f] System.Net.Http.HttpConnectionPoolManager.SendAsync(System.Net.Http.HttpRequestMessage, Boolean, System.Threading.CancellationToken)+7f
[0x7ffcee0b4966] System.Net.Http.HttpConnectionHandler.SendAsync(System.Net.Http.HttpRequestMessage, System.Threading.CancellationToken)+16
The WebProxy instance is a singleton with the proxy uri, bypassOnLocal set to true, and uris for bypassList.
class System.Net.WebProxy:
...
ctor(Uri address, bool bypassOnLocal, string[] bypassList):
// ...
bool IsBypassed(Uri host):
if (host == null)
throw new ArgumentNullException(nameof(host))
return
Address == null ||
(BypassProxyOnLocal && IsLocal(host)) ||
IsMatchInBypassList(host)
bool IsLocal(Uri host):
// some checks, and IPGlobalProperties.GetIPGlobalProperties().DomainName call
// which is slow and blocking under load
bool IsMatchInBypassList(Uri input):
// regex match checks using bypassList
We found that the SystemIPGlobalProperties.get_DomainName() is a blocking call due to a lock (see dnsapi!TimedLock_Enter+2c in IL trace above) thereby causing thread contention. This makes IsLocal(Uri) call slow. I set BypassProxyOnLocal to false to bypass the IsLocal(host) call. The change resulted in ~4x’ing the throughput for the load test to 320/s (from 90/s) for the cluster, reducing the response time to 2.5s avg (from 3.9s avg), with a higher 810 users (from 360 users). CPU Usage was below 10% (from 20%) and api’s w3wp threads count steady at 50-60 (from 50-100).
JMeter load test results (before and after):
Comment Users Duration Nodes # Samples Average Median 90% Line 95% Line 99% Line Min Max Error % Throughput
before 360 10m 8 55084 3863 3752 4505 4862 10443 2064 12122 0.00% 88.79632
after 810 10m 8 184909 2438 2339 3022 3194 3493 1951 7857 0.00% 319.79447
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (15 by maintainers)
LTS doesn’t mean everything is backported. Only select fixes are backported when they’re deemed impactful enough to enough apps/developers. cc: @karelz
If the performance is very critical for your scenario you could copy the code from .Net 5.0 in your project. License allows this and it is not so not much work.
https://github.com/dotnet/runtime/pull/1914/files#diff-35de7b31330089f70ddb6d21a92db1483e2b3f82358860c5ea0073ee5f90d1f9R35
Nope, you’re right, it’s static, I somehow thought it was instance. Apologies.
And, indeed, it is being cached correctly. So while there’s other stuff to be cleaned up here (which I’ll do shortly), this method is doing what it’s supposed to be doing (at least on Windows).
@rmandvikar, can you please share a repro that shows otherwise? You wrote: “We found that the SystemIPGlobalProperties.get_DomainName() is a blocking call due to a lock” are you seeing that method being called repeatedly, or just at the beginning of the app’s execution? Is it possible your trace is just showing the initial locking that’s happening, potentially with many concurrent requests on first use all blocked waiting for this?
It is, by EnsureInitialized.