swift-nio: Synchronous connect errors do not properly unregister their selector.

The Kitura test suite that runs with swift-nio 1.3.1 in 3-4 seconds, takes 133-134 seconds with 1.4.2.

I drilled down a bit and found each ClientBootstrap.connect().wait() waiting longer with 1.4.2(~0.25s) as compared to 1.3.1 (~0.01s). We’re using only loopback connections.

Came across this change: https://github.com/apple/swift-nio/pull/221 Not sure if that causes the problem.

I’ll try to write up a small benchmark.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 16 (16 by maintainers)

Most upvoted comments

Ok, we worked this out.

This bug manifested only on systems that had getaddrinfo implementations that would return IPv6 addresses on systems that had no IPv6 stack enabled. When that happens, we fall into the following strace flow.

Firstly, we attempt our first outbound connections. This is to ::1. We begin by creating the socket, and registering for nothing. In 1.4.1, registering for nothing means registering only for EPOLLERR, due to #277 (fixed in #286).

[pid  9229] socket(PF_INET6, SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_IP) = 6
[pid  9229] epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLERR, {u32=6, u64=6}}) = 0

We then attempt to connect. Note that this is done synchronously, before any call to epoll_wait.

Here’s the crucial thing: this connect attempt fails synchronously, with EADDRNOTAVAIL. This is because we have no IPv6 address to bind to.

[pid  9229] connect(6, {sa_family=AF_INET6, sin6_port=htons(4433), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)

Having failed to connect, the Happy Eyeballs implementation immediately attempts to connect to the next IP in the list. We create a new socket (FD 7), register it for nothing, and then connect, getting EINPROGRESS.

[pid  9229] socket(PF_INET, SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
[pid  9229] epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLERR, {u32=7, u64=7}}) = 0
[pid  9229] connect(7, {sa_family=AF_INET, sin_port=htons(4433), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)

Per our normal behaviour here, we register for writable to wait for the connect to succeed.

[pid  9229] epoll_ctl(3, EPOLL_CTL_MOD, 7, {EPOLLOUT|EPOLLERR|EPOLLRDHUP, {u32=7, u64=7}}) = 0

Here we run into #286, and this is why the master branch does not show this problem. Note that both FD 7 is returned as connected, but also FD 6 is returned with EPOLLHUP. In 1.4.1 with #286 in it, EPOLLHUP will not cause us to close or unregister that FD, so it’ll keep returning.

[pid  9229] epoll_wait(3, {{EPOLLIN, {u32=4, u64=4}}, {EPOLLHUP, {u32=6, u64=6}}, {EPOLLOUT, {u32=7, u64=7}}}, 64, -1) = 3

We now have a healthy socket. We register for readable, and unregister for writable. Eventually, the main thread closes the socket, causing us to EPOLL_CTL_DEL FD 7 and eventually close it.

[pid  9229] epoll_ctl(3, EPOLL_CTL_MOD, 7, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLRDHUP, {u32=7, u64=7}}) = 0
[pid  9229] epoll_ctl(3, EPOLL_CTL_MOD, 7, {EPOLLIN|EPOLLERR|EPOLLRDHUP, {u32=7, u64=7}}) = 0
[pid  9229] epoll_wait(3, {{EPOLLIN, {u32=4, u64=4}}, {EPOLLHUP, {u32=6, u64=6}}}, 64, -1) = 2
[pid  9229] epoll_wait(3, {{EPOLLHUP, {u32=6, u64=6}}}, 64, -1) = 1
[pid  9229] epoll_wait(3, {{EPOLLHUP, {u32=6, u64=6}}}, 64, -1) = 1
[pid  9229] epoll_ctl(3, EPOLL_CTL_DEL, 7, {0, {u32=0, u64=0}}) = 0
[pid  9229] close(7)                    = 0

Notice that the whole time we were doing this, epoll_wait was returning immediately with EPOLLHUP on FD 6. This is where the observed slowdown came from: we were busy-looping on epoll, because we’d insert an FD for the IPv6 connection and never remove it. This prevents the event loop responding to I/O events promptly, and also starves out other work while we process this FD event.

This was subtly fixed by #286 because we now close a channel that gets EPOLLHUP. This is not properly a fix for this issue though: in general, we should not rely on the fact that epoll will moan about unconnected sockets to resolve our failure to handle connect errors.

The bug was introduced by #221. My current working theory is that the default addrinfo hints on IPv4-only Linux systems includes an interesting interpretation of ai_family. This is in contravention to the man page, but entirely possible. As a result, when we stopped using the default hints, we lost this behaviour and started getting the misbehaving IPv6 results.

This suggests we want one fix in the short term, and one in the longer term. The short term fix, which this issue will track, is that we need to better handle synchronous connect errors. Currently if connect errors synchronously, the channel will not appropriately transition to closed/unregistered. It should.

A longer-term fix should be to investigate how easy it is to detect IPv4-only/IPv6-only configurations and set ai_family appropriately in those cases, to avoid burning several syscalls attempting to make a connection we know cannot possibly succeed.

I’ll tackle the first fix now, and open an issue for the second.

Lukasa on Apr 18, 2018

@pushkarnk we are on it 😉 Not sure yet

normanmaurer on Apr 18, 2018