runtime: Singleton HttpClient doesn't respect DNS changes

As described in the following post: http://byterot.blogspot.co.uk/2016/07/singleton-httpclient-dns.html, once you start keeping a shared HttpClient instance to improve performance, you’ll hit an issue where the client won’t respect DNS record updates in case of failover scenarios.

The underlying issue is that the default value of ConnectionLeaseTimeout is set to -1, infinite. It’ll only close on dispose of the client, which is very inefficient.

The fix is to update the value on the service point like this:

var sp = ServicePointManager.FindServicePoint(new Uri("http://foo.bar/baz/123?a=ab"));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute

Unfortunately, there’s no way to do this with .NET Core today.

Either ServicePointManager should be brought over to .NET Core or similar equivalent functionality should be enabled some other way.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 66
  • Comments: 77 (24 by maintainers)

Most upvoted comments

This is absolutely an issue. The proposed solution by @darrelmiller seems to demonstrate a lack of understanding of the use case.

We use Azure Traffic Managers to evaluate the health of an instance. They work on top of DNS with short TTLs. A host doesn’t magically know it is unhealthy and start issuing Connection:close headers. The traffic manager simply detects the unhealthy state and adjusts DNS resolution accordingly. By design this is COMPLETELY TRANSPARENT to clients and hosts… as it stands, an application with a static HttpClient resolving DNS (unknowingly) by a traffic manager will never receive a new host when the previously resolved host becomes unhealthy.

As of netcore 1.1 I’m currently searching for a resolution to this exact problem. I don’t see one, nativly anyway.

@onovotny The original issue, as I understand it is that, on the server side, someone wants to update a DNS entry so all future requests to host A go to new IP address. The best solution in my opinion is when that DNS entry is changed, the application on host A should be told to return a Connection:close header to the client on all future responses. That will force all clients to drop the connection and establish a new one. The solution is not to have a timeout value on the client so the client sends a Connection:close when that “lease” timeout expires.

@onovotny, there is no ServicePointManager class in .NET Core. This sounds like a .NET Framework (Desktop) issue. Please open up an issue in UserVoice or Connect.

I think there’s some miscommunication here: DNS TTLs are completely unrelated to HTTP/TCP connection lifetimes.

New HTTP connections do a DNS lookup and connect to the returned host address. Multiple HTTP requests can be made over the same single connection but once opened the HTTP connection (and the underlying TCP connection) does not do a DNS lookup again.

This is correct behavior so even if you make 1000’s of requests over a connection that lasts for days, the DNS resolution doesn’t matter since the connection hasn’t changed. If you make a new HTTP request over a new HTTP connection then you will see a DNS lookup for the new connection.

This DNS refresh has nothing to do with HttpClient, which just uses HttpClientHandler by default (which itself uses the HttpWebRequest sockets stack) and any DNS resolution is handled by that stack. ServicePointManager manages ServicePoint objects in that stack which are connections to a specific host. ServicePointManager does have some minimum client-side caching of DNS lookups which you can tweak.

Default settings are meant to have connections stay open indefinitely for efficiency. If the connected host becomes unavailable then the connection will be broken automatically, causing a new connection with a new DNS lookup, however DNS TTLs will not cause any new connections. If you need that, you’ll have to code that into your app.

If you want another DNS lookup to be done, you need to reset the connection on the client. You can force a max-lifetime of the TCP connection, send a connection: close header or just dispose the HttpClient or the underlying ServicePoint. You can also do the DNS resolution with Dns.GetHostAddresses and use the IP address instead for your HTTP requests.

Hope that helps, if I’m off on this then please let me know.

I’d consider many of the participants in this thread to be something of a “dream team” in this subject matter, yet I’m still not sure if any consensus has been reached on what the best practices are given what we have today. It would be great to get a decisive answer from this group on a couple key points.

Let’s say I have a general purpose HTTP library targeting many platforms, not all of which support ServicePointManager, and I’m attempting provide behavior that manages HttpClient instances smartly, per best practices, by default. I have no idea ahead of time what hosts will be called, let alone whether they’re well behaved with respect to DNS changes/Connection:close headers.

  1. How do I optimally manage HttpClient instances? One per endpoint? One per host/schema/port? Literally one singleton total? (I doubt that, since consumers may want to take advantage of default headers, etc.)

  2. How would I most effectively deal with this DNS issue? Dispose instances after 1 minute? Treat it as “highly unlikely” and don’t provide any disposal behavior by default?

(Not a hypothetical situation btw; I’m wrestling with these exact questions right now.)

Thanks!

@kudoz83 You are correct, a host doesn’t magically know when a traffic manager has adjusted the DNS resolution, hence why I said

when that DNS entry is changed, the application on host A should be told

If you don’t believe that it is practical to notify an origin server that it should return connection:close headers then you are left with just a few choices:

  • Periodically close connections from a client or middleware and pay the cost of re-opening those connections redundantly
  • Have the client poll a service to know when it should reset connections.
  • Use a piece of middleware that is smart enough to know when the DNS has been switched.

And regarding me not understanding the use case. The original issue was based on a blog post where the use case was related to switching between production and staging slots and had nothing to do with health monitoring. Although both scenarios could benefit from being able to receive notifications of DNS/slot switches.

As an FYI, here’s a related post covering this issue and an interim solution. Beware of HttpClient

Unfortunately, there’s no way to do this with .NET Core today. Either ServicePointManager should be brought over to .NET Core or similar equivalent functionality should be enabled some other way.

@onovotny, SocketsHttpHandler in .NET Core 2.1 exposes PooledConnectionLifetime, which serves a purpose similar to ConnectionLeaseTimeout, just at the handler level. Can we consider this issue addressed at this point?

@manigandham

I think there’s some miscommunication here: DNS TTLs are completely unrelated to HTTP/TCP connection lifetimes.

Yes DNS TTLs are fairly unrelated, but not completely.

New HTTP connections do a DNS lookup and connect to the returned host address.

Yes, that is correct. But Windows has a local DNS cache that respects the TTL. If the TTL is high and the DNS server has changed the DNS record, then the new connection will be opened against the old IP address due to client caching of the DNS record. Flushing the client DNS cache is the only workaround to this that I am aware of.

Multiple HTTP requests can be made over the same single connection but once opened the HTTP connection (and the underlying TCP connection) does not do a DNS lookup again.

Correct again.  In a production/staging swap where the swapped out server is still responding, this will cause many future requests to go to the old server, which is bad. However, the most recent conversation has been about the scenario where a server fails and the Traffic Manager swaps to a new server. What happens to the existing connection to the failed server is unknown. If it is a severe hardware failure, then there is a good chance the connection will be dropped. That would force the client to re-establish a connection and that is where a low TTL is useful. If however, the server failure doesn’t break the server and it simply returns 5XX responses then it would be useful for the server to return connection:close to ensure that future requests will be made on a new connection, hopefully to the new working server

This DNS refresh has nothing to do with HttpClient, which just uses HttpClientHandler by default (which itself uses the HttpWebRequest sockets stack)

Yes and no. On .Net core, the HttpClientHandler has been re-written and no-longer uses HttpWebRequest and therefore doesn’t use ServicePointManager. Hence the reason for this issue.

I took a closer look at IIS and there doesn’t appear to be a way to get 5XX responses to come with connection:close headers. It would be necessary to implement a HttpModule to do that magic.

I don’t think that’s the same property. That’s the connection timeout, not the ConnectionLeaseTimeout.

As documented in https://msdn.microsoft.com/en-us/library/system.net.servicepoint.connectionleasetimeout.aspx, it should be dropped periodically in some scenarios. But there’s no apparent way to change the behavior on .NET Core.

Waiting for that https://www.stevejgordon.co.uk/introduction-to-httpclientfactory-aspnetcore or implementing bits of it yourself.

Either ServicePointManager should be brought over to .NET Core

While we have brought ServicePointManager and HttpWebRequest for platform parity, it doesn’t support most of the functionality in .Net Core as the underlying stack resolves to either WinHTTP or Curl which do not expose the advanced control knobs.

Using ServicePointManager to control HttpClient is actually a side-effect on using HttpWebRequest and the managed HTTP stack on .Net Framework. Since WinHttpHandler/CurlHandler are used on .Net Core, SPM has no control over HttpClient instances.

or similar equivalent functionality should be enabled some other way.

Here’s WinHTTP team’s answer:

  1. For security reasons, while the client keeps sending requests and there are active connections, WinHTTP will continue to use the destination IP. There is no way to limit the time these connections remain active.
  2. If all connections to the remote endpoint were closed by the server, new connections will be made querying DNS and thus towards the new IP.

To force clients to the next IP, the only viable solution is ensuring that the server rejects new connections and closes existing ones (already mentioned on this thread). This is the same with what I understand from the ATM documentation: detection is made based on GET requests failing 4 times (non-200 responses) in which case the connections will be closed by the server.

Given other network appliance devices that are themselves caching DNS (e.g. home routers/APs) I wouldn’t recommend DNS fail-over as a solution for low-latency, high availability technique. If that is required, my recommendation for a controlled fail-over would be having a dedicated LB (hardware or software) that knows how to properly drain-stop.

as it stands, an application with a static HttpClient resolving DNS (unknowingly) by a traffic manager will never receive a new host when the previously resolved host becomes unhealthy.

As mentioned above, if by unhealthy you mean that the server returns a HTTP error code and closes the TCP connection while you observe the DNS TTL expired, this is indeed a bug.

Please describe your scenario in more detail:

  1. During the failover, what is the client-machine’s DNS cache and TTLs. You can use ipconfig /showdns and nslookup -type=soa <domain_name> to compare what the machine thinks the TTL is and the authoritative NS’s TTLs.
  2. Are there active connections towards the remote server? You can use netstat /a /n /b to see the connections and processes owning them.
  3. Creating a repro would help us a lot: please attach any available info such as: version of .Net Core/Framework, code for client/server, network traces, remote DNS LB/Traffic Manager involved, etc.

Is there in .Net Core an HTTP client that works over sockets?

Yes, SocketsHttpHandler, and as of 2.1 it’s the default.

@stephentoub just seeing this now, yes, I think PooledConnectionLifetime should address it. The community can open another issue if something is still needed.

I know this is old, but I believe you should not use DateTime.UtcNow for time interval measurement. Use a monitonic time source like Stopwatch.GetTimestamp. This way you’re immune to local time adjustment by operator or time synchronization. There were some buggy chipsets that made QPC values go back, but I don’t believe many of them are in use now.

While the given answers that specify what should happen on the server-side are accurate for an ideal world, in the real world developers often do not have control of the full server-side stack, that they are communicating with. In that regard I really like the concept of a ConnectionLeaseTimeout, as it is a concession to the realities of the daily life of John and Jane Doeveloper. Someone suggested writing middleware to enforce such a timeout on .NET Core. Would this class based on NodaTime and System.Collections.Immutable do the job?

    public class ConnectionLeaseTimeoutHandler : DelegatingHandler
    {
        private static string GetConnectionKey(Uri requestUri) => $"{requestUri.Scheme}://{requestUri.Host}:{requestUri.Port}";

        private readonly IClock clock;
        private readonly Duration leaseTimeout;
        private ImmutableDictionary<string, Instant> connectionLeases = ImmutableDictionary<string, Instant>.Empty;

        public ConnectionLeaseTimeoutHandler(HttpMessageHandler innerHandler, IClock clock, Duration leaseTimeout)
            : base(innerHandler)
        {
            this.clock = clock;
            this.leaseTimeout = leaseTimeout;
        }

        protected override Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            string key = GetConnectionKey(request.RequestUri);
            Instant now = clock.GetCurrentInstant();

            Instant leaseStart = ImmutableInterlocked.GetOrAdd(ref connectionLeases, key, now);

            if (now - leaseStart > leaseTimeout)
            {
                request.Headers.ConnectionClose = true;
                ImmutableInterlocked.TryRemove(ref connectionLeases, key, out leaseStart);
            }

            return base.SendAsync(request, cancellationToken);
        }
    }

Our code to get an HttpClient is:

			var timeSinceCreated = DateTime.UtcNow - _lastCreatedAt;
			if (timeSinceCreated.TotalSeconds > SecondsToRecreateClient)
			{
				lock (Padlock)
				{
					timeSinceCreated = DateTime.UtcNow - _lastCreatedAt;
					if (timeSinceCreated.TotalSeconds > SecondsToRecreateClient)
					{
						_currentClient = new HttpClient();
						_lastCreatedAt = DateTime.UtcNow;
					}
				}
			}

			return _currentClient;

So I guess the the lock only occurs once per minute so that multiple threads aren’t trying to create a new HttpApiClient simultaneously. I thought we were locking more aggressively.

AFAIK there is not a way to notify a failed host that it is failed? Generally this would mean it is in some kind of error state. The original post is talking about failover scenarios.

Azure Traffic Manager is explained here https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-monitoring

It operates at the DNS level and is transparent to clients and hosts - by design. Azure instances communicating with one another via Traffic Manager for DNS resolution are in a pickle without being able to set a TTL on DNS refresh for an HttpClient.

The way HttpClient currently operates appears to be at odds with Microsoft’s own Azure service offerings - which is the entire point. Not to mention, any similar service (E.G. Amazon Route 53) works exactly the same way, and has the exact same short DNS TTL dependency.

Obviously I found this thread because I’m impacted by the issue. I’m simply trying to clarify that there doesn’t appear to be an ideal workaround currently other than just arbitrarily recreating HttpClients (at the cost of performance and complexity) for the sake of making sure DNS failover succeeds.

Does HttpClient cache URIs’ it hits, similarly to what RestClient I believe does? If it does, please let me know on which version (.NET Core 3.0/.NET Framework 3.0 etc) it supports this.

I see that caching is only done in RestClient if the project is a .Net Core 2.1+ App?

Ah sorry misunderstood what you meant by recycle.

@onovotny @karelz @stephentoub,

Would you all concur that this is an accurate summary of how best to solve the original singleton HttpClient/DNS problem?

  • In .NET Framework 2.0+, use ServicePoint.ConnectionLeaseTimeout.

  • In .NET Core 2.1, use SocketsHttpHandler.PooledConnectionLifetime.

  • For all other platforms/targets/versions, there is no reliable solution/hack/work-around, so don’t even bother trying.

As I continue to learn things from this thread and others (such as ConnectionLeaseTimeout not working reliably across platforms in .NET Standard 2.0), I’m back to the drawing board trying to solve this from a library in a way that works on as many platforms/versions as possible. Platform-sniffing is fine. Periodically disposing/recreating HttpClient is fine, if that’s effective. I’m fairly certain I got it wrong in my first attempt, which was to simply send a connection:close header to the host at regular intervals. Thanks in advance for any advice!

var sp = ServicePointManager.FindServicePoint(new Uri(“http://foo.bar/baz/123?a=ab”)); sp.ConnectionLeaseTimeout = 60*1000; // 1 minute

This code is placed in the constructor?

@withinoneyear from my understanding yes. If you create an HttpClient and utilize it against Production Green, then swap production to Blue the HttpClient will still have an open connection to Green until the connection is closed.

@tmds Thanks for the link, that’s very interesting.

I do wonder how common it is to rely on DNS TTL vs using a load balancer. The latter seems like a much more reliable solution in general.

@tmds I haven’t seen one, but as others have mentioned, this relates to more than just Azure Traffic Manager: AWS has a similar concept and there’s Azure App Service with its staging/production slot swapping. And I bet other cloud providers also offer something similar to this swapping mechanism.

I wouldn’t even be invited to the dream team tryouts, but I do have one insight regarding your singleton HttpClient comment. First, you can always use the SendAsync method to specify different headers and addresses, so you could wrap that with your class that provides shared data similar to HttpClient (e.g. default headers, base address) yet ends up calling the same single client. But even better, you can create multiple HttpClient instances who all share the same HttpClientHandler: https://github.com/dotnet/corefx/issues/5811.

As for the DNS issue, the consensus here seems to be that:

To force clients to the next IP, the only viable solution is ensuring that the server rejects new connections and closes existing ones

If this is not viable for you, Darrel Miller suggested some more options here: https://github.com/dotnet/corefx/issues/11224#issuecomment-270498159.

If you connect to a machine and the TCP/HTTP connection continues to work, why should you at some point assume the server is no longer appropriate. I think there is room for improvement on the Azure side. For example, when machines are unhealthy or swapping environments, existing connections can be closed.

@kudoz83 No worries, we all have days like that 😃

From what I understand ConnectionLeaseTimeout is literally a timer that periodically closes an idle connection from the client side. It is implemented in ServicePointManager, which doesn’t exist in core. This is different than the “idle connection” timeout which only closes a connection once it has not be used for a certain period of time.

There is likely a Win32 call that would allow flushing the DNS client cache. That would ensure new connections do a DNS lookup. I took a quick look for it but didn’t find it, but I’m sure it is there somewhere.

In .net core on Windows, the management of HTTP connections is actually left to the OS. The closest you get is the call to the Win32 API WinHttpOpen https://github.com/dotnet/corefx/blob/master/src/System.Net.Http.WinHttpHandler/src/System/Net/Http/WinHttpHandler.cs#L718 There are no public APIs that I can find to do anything with the _sessionHandle. To get the kind of behavior change you are looking for would require the OS folks to make changes. At least that’s how I understand it.

Wouldn’t it be easier to get Azure Traffic Manager just to set a very low TTL on the DNS entries? If you aren’t concerned about long lived connections, then a low TTL should minimize the chance of new connections being setup against a dead server.

The upside of the way WinHttpHandler is built is that we get things like HTTP/2 for free with the OS update. The downside is we don’t get a whole lot of control in managed code.