runtime: WebProxy: IsBypassed(Uri) call is slow under load when bypassOnLocal is true

Description

In WebProxy, the IsBypassed(Uri) call is slow when bypassOnLocal is true under load. This is because the SystemIPGlobalProperties.get_DomainName() is a blocking call due to a lock thereby causing thread contention. This makes IsLocal(Uri) call slow, making IsBypassed(Uri) call slow.

class System.Net.WebProxy:
    ...
    ctor(Uri address, bool bypassOnLocal, string[] bypassList):
        // ...
    
    bool IsBypassed(Uri host):
        if (host == null)
            throw new ArgumentNullException(nameof(host))
        return
            Address == null ||
            (BypassProxyOnLocal && IsLocal(host)) ||
            IsMatchInBypassList(host)
    
    bool IsLocal(Uri host):
        // some checks, and IPGlobalProperties.GetIPGlobalProperties().DomainName call
        // which is slow and blocking under load
    
    bool IsMatchInBypassList(Uri input):
        // regex match checks using bypassList

Configuration

  • Which version of .NET is the code running on? netcore 3.1
  • What OS version, and what distro if applicable? Windows 2016, IIS
  • What is the architecture (x64, x86, ARM, ARM64)? x64
  • If relevant, what are the specs of the machine?

platform: os : Windows Server 2016 Datacenter kernel_name : Windows kernel_release : 10.0.14393 machine : x86_64

cpu: vendor_id : GenuineIntel model_name : Intel® Xeon® Gold 6132 CPU @ 2.60GHz cpu_cores : 8 cpu_logical_processors : 8 mhz : 2600 family : 6 model : 79 stepping : 0

memory: total : 12.88G

cluster size: 8 nodes

Regression?

DNK

Data

Background: The app that is load-tested is a REST api microservice that calls other REST apis and external network calls are proxied (using the WebProxy class).

  • The JMeter load tests was capping at 90/s for the cluster. The times taken by dependency apis did not add up to the total time taken by the transaction. There were large time gaps that were unaccounted for (internal overhead) as the load increased. CPU Usage at ~20% steady (bounded), Memory Available / Usage steady indicating not bound by those resources.
  • Load tests resulted in threads added to thread pool for api’s w3wp process. The threads count increased ~2x during the leading 2m of a load run. This pointed to thread pool starvation.
  • The endpoint with multiple worker processes (web gardening) perf’ed better at 135/s throughput (wp=2). CPU Usage at 20% * worker processes and steady (bounded). This pointed to thread pool starvation.

JMeter load test result:

Comment Users Duration Nodes # Samples Average Median 90% Line 95% Line 99% Line  Min   Max Error % Throughput
before    360      10m     8     55084    3863   3752     4505     4862    10443 2064 12122   0.00%   88.79632

Analysis

The PerfView and memory dump traces collected during a load test pointed to the problem. The calls were slow while getting the DNS related information for the domain name in the URL (System.Net.NetworkInformation.SystemIPGlobalProperties.get_DomainName()+24).

[0x7ffd6ba75a84] ntdll!ZwWaitForSingleObject+14
[0x7ffd68404daf] KERNELBASE!WaitForSingleObjectEx+8f
[0x7ffd6749e5ec] dnsapi!TimedLock_Enter+2c
[0x7ffd6749ebd2] dnsapi!NetInfo_Get+2de
[0x7ffd674b78be] dnsapi!Config_GetDnsServerList+52
[0x7ffd674b79d3] dnsapi!Config_GetDnsServerListIp4+3b
[0x7ffd674a9ee7] dnsapi!DnsQueryConfig+21bd7
[0x7ffd6745c006] IPHLPAPI!AllocateAndGetFixedInfo+76
[0x7ffd6745c341] IPHLPAPI!GetNetworkParams+31
[0x7ffcef3ceae1] ILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeLocalAllocHandle, UInt32 ByRef)+a1
[0x7ffcef3ce8ad] System.Net.NetworkInformation.HostInformationPal.GetFixedInfo()+cd
[0x7ffcef3ce7b4] System.Net.NetworkInformation.SystemIPGlobalProperties.get_DomainName()+24
[0x7ffcef3ce698] System.Net.WebProxy.IsLocal(System.Uri)+a8
[0x7ffcee0b4c6a] System.Net.WebProxy.IsBypassed(System.Uri)+3a
[0x7ffcee0b4a8f] System.Net.Http.HttpConnectionPoolManager.SendAsync(System.Net.Http.HttpRequestMessage, Boolean, System.Threading.CancellationToken)+7f
[0x7ffcee0b4966] System.Net.Http.HttpConnectionHandler.SendAsync(System.Net.Http.HttpRequestMessage, System.Threading.CancellationToken)+16

The WebProxy instance is a singleton with the proxy uri, bypassOnLocal set to true, and uris for bypassList.

class System.Net.WebProxy:
    ...
    ctor(Uri address, bool bypassOnLocal, string[] bypassList):
        // ...
    
    bool IsBypassed(Uri host):
        if (host == null)
            throw new ArgumentNullException(nameof(host))
        return
            Address == null ||
            (BypassProxyOnLocal && IsLocal(host)) ||
            IsMatchInBypassList(host)
    
    bool IsLocal(Uri host):
        // some checks, and IPGlobalProperties.GetIPGlobalProperties().DomainName call
        // which is slow and blocking under load
    
    bool IsMatchInBypassList(Uri input):
        // regex match checks using bypassList

We found that the SystemIPGlobalProperties.get_DomainName() is a blocking call due to a lock (see dnsapi!TimedLock_Enter+2c in IL trace above) thereby causing thread contention. This makes IsLocal(Uri) call slow. I set BypassProxyOnLocal to false to bypass the IsLocal(host) call. The change resulted in ~4x’ing the throughput for the load test to 320/s (from 90/s) for the cluster, reducing the response time to 2.5s avg (from 3.9s avg), with a higher 810 users (from 360 users). CPU Usage was below 10% (from 20%) and api’s w3wp threads count steady at 50-60 (from 50-100).

JMeter load test results (before and after):

Comment Users Duration Nodes # Samples Average Median 90% Line 95% Line 99% Line  Min   Max Error % Throughput
before    360      10m     8     55084    3863   3752     4505     4862    10443 2064 12122   0.00%   88.79632
after     810      10m     8    184909    2438   2339     3022     3194     3493 1951  7857   0.00%  319.79447

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (15 by maintainers)

Most upvoted comments

LTS doesn’t mean everything is backported. Only select fixes are backported when they’re deemed impactful enough to enough apps/developers. cc: @karelz

My app is still on 3.1 and can’t upgrade to 5.0 yet.

If the performance is very critical for your scenario you could copy the code from .Net 5.0 in your project. License allows this and it is not so not much work.

Sorry my misunderstanding

Nope, you’re right, it’s static, I somehow thought it was instance. Apologies.

And, indeed, it is being cached correctly. So while there’s other stuff to be cleaned up here (which I’ll do shortly), this method is doing what it’s supposed to be doing (at least on Windows).

@rmandvikar, can you please share a repro that shows otherwise? You wrote: “We found that the SystemIPGlobalProperties.get_DomainName() is a blocking call due to a lock” are you seeing that method being called repeatedly, or just at the beginning of the app’s execution? Is it possible your trace is just showing the initial locking that’s happening, potentially with many concurrent requests on first use all blocked waiting for this?

s_fixedInfoInitialized is never assigned to true in GetFixedInfo()

It is, by EnsureInitialized.