hyperkit: vsock crash

Hi,

We have an annoying crash of docker for mac which boils down to invalid ringbuf check here AFAIU.

Log:

Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.TCPV4.read connected TCP: caught Uwt.Uwt_error(Uwt.ECONNRESET, "uwt_read", "") returning Eof
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:55 estus Docker[42832] <Error>: Fatal unexpected exception: Socket.Closed
Aug  4 19:28:55 estus Docker[42834] <Notice>: virtio-net-vpnkit: initialising, opts="uuid=01b794ee-68a9-4b92-9c79-b2b44e5f271e,path=/Users/free2use/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/free2use/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0"
Aug  4 19:28:55 estus Docker[42834] <Notice>: Error reading vsyslog length: EOF
Aug  4 19:28:55 estus Docker[42834] <Notice>: Assertion failed: (sc->reply_cons != sc->reply_prod), function send_response_common, file src/pci_virtio_sock.c, line 879.

As far as I can see the error is here:

    if (sc->reply_prod == VTSOCK_REPLYRINGSZ)
        sc->reply_prod = 0;
    /* check for reply ring overflow */
    /* XXX correct check? */
    DPRINTF(("TX: QUEUING REPLY IN SLOT %x (prod %x, cons %x)\n",
         slot, sc->reply_prod, sc->reply_cons));
    assert(sc->reply_cons != sc->reply_prod);

I.e. someone should “XXX correct check” 😃

Alternatively, if it’s hard to fix could we make VTSOCK_REPLYRINGSZ bigger (if it will help to make error more rare)?

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 17
Comments: 24

Commits related to this issue

vsock: Add some comments and descriptively named macros to reply ring This (hopefully) helps make it clearer what is going on when the ring has filled up and why the various tests are correct. This ... — committed to ijc/hyperkit by ijc 8 years ago
vsock(v7): Ensure we periodically check the reply ring even with lots of rx v7 of the virtio vsock spec proposal adds a requirement to continue processing the TX ring so long as "additional resources... — committed to ijc/hyperkit by ijc 8 years ago

Most upvoted comments

Hi @tailhook & thanks for your report. Sorry for the delay replying, I’ve been AFK travelling.

The assertion is checking whether the reply ring is full, I’m reasonably confident the check is correct, although as the comment indicates a second pair of eyes wouldn’t go amis 😉. However this is slightly moot in light of seeing the assertion actually fire, even if the check were wrong it would certainly only be an off-by-one, which would indicate that either the ring had but a single slot left or had overflowed by one slot (the latter is obviously worse, but as I’ll explain still a bit moot).

The issue is that if the reply ring has filled up (+/-1 if the check is wrong, but one slot either way isn’t going to save us or change what follows) then due to a protocol bug in the vsock protocol spec there is actually nothing we can do other than crash or deadlock. The assert turns the deadlock into a crash, which is obviously not ideal either but better than some other component silently locking up. There is a description of the deadlock in https://github.com/docker/hyperkit/commit/1ea03016ec5dc0e612a2e925b6eefcd1971cb694 which is where the reply ring was introduced (previously the deadlock was much more obvious, there are also some subsequent followup fixes but that is the main one). This all stems from the assertion in the design that vsock is reliable and therefore cannot drop anything, while also not including any back pressure mechanisms, while also including cross talk between the rx and tx rings (i.e. a tx request can result in a requirement to have a free rx slot).

I raised this deadlock upstream a while back and had a discussion with the guys doing the protocol specification (as well as the Linux frontend implementation) and they were going to look at remedying it in the next version. I see in my inbox this morning that I have a new set of frontend patches (RFC v6) and a new version of the specification proposal (v7, nb the spec proposal and the frontend implementation are independently numbered, but v6 of the patches corresponds to v7 of the spec) and a changelog which specifically mentions this issue, so hopefully something has changed (or been more tightly specified) which will resolve this issue at root.

That’s going to take a little time to digest and to make the necessary updates on both ends. In the meantime increasing VTSOCK_REPLYRINGSZ is not a bad idea, as you can see in 1ea03016ec5dc0e612a2e925b6eefcd1971cb694 I already doubled it once to 2x ring size, going to 4x ring size can’t hurt.

If you could give details of your repro case (ideally in the form of a Dockerfile or docker-compose.yml and a simple docker build/run or docker-compose up that would be handy since your workload seems to be even better at provoking this behaviour than ones I’ve found previously, to be hitting the assert it must be provoking many outstanding vsock tx requests compared with the ring size.

ijc on Aug 8, 2016

First pass with beta28 and it looks 👍 . If you don’t see me complain/link to a new issue, this is resolved in the wild for us!

bryanhelmig on Oct 13, 2016

If you could give details of your repro case

It’s hard to do. We have a huge web project in python. We run nodemon (to start python proccess) and webpack --watch for monitoring filesystem events. There are literally tens of thousands of inotify watches (default 8k limit doesn’t work for us). Both things run on the (osxfs) volume.

The python process is a web application with a pretty large HTML pages (~10Mb). The crash usually happens when we refresh a page in a webbrowser. In fact when we use minified version of page (which is about 200Kb, i.e. 50x smaller) the failure is more rare. So it probably depends on amount of traffic between host system and containers. Each web page visit also fetches data from a database that is outside of this physical box.

Also, each web page touches session file which is put on shared volume. And it may happen (don’t know how to verify it) that some inotify watches may follow that file changes for whatever reason we misconfigured the watchers.

This is the story. I can’t publish all of our sources. But if you’ll give me some hints which exact operations influence the ringbuf usage, we can come up with some smaller script to reproduce the issue: do we need filesystem operations? are network connections or packets fill the ringbuf? what about inotify?

tailhook on Aug 8, 2016