dvtm: Closing a shell causes window to hang on OS X 10.11

There’s several ways to get to this frozen shell problem since upgrading to 10.11. I don’t know enough about C to help, but I would take a wild guess that it relates to OS X’s new SIPS.

  • Quit a window with the key binding (e.g. in my case C-d q)
  • Type exit at a shell
  • Open and then close the buffer mode pager thingy (e.g. in my case C-d e to open a scrollback buffer in less which I gather has something to do with copying and pasting but I just use it for scrolling)

This causes the window to basically freeze, In essence I can spawn windows but not close them.

On @martanne’s advice I ran the following after intentionally causing the issue.

screen shot 2016-02-18 at 12 31 57

Which seemed to point to this line of code.

screen shot 2016-02-18 at 12 31 29

I’ve tried this with clean user configs (bashrc, profile etc), on two different machines and with clean installs of dvtm using the unmodified default config.

Hope someone can help because I’ve been using this thing daily for years now and I really miss it!

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 2
  • Comments: 26 (5 by maintainers)

Commits related to this issue

Most upvoted comments

I ran into this (or a similar?) problem trying to use dvtm on my macOS 10.14.5 system. I did some investigating, and I think I have made progress toward finding the root cause.

pselect() only sets the signal mask if it blocks. If any of the file descriptors are ready for reading immediately, pselect() returns right away, the signal mask is untouched, and any normally blocked signals are left pending.

When I exit a shell on my Linux machine, the read() in vt_process() returns -1 with errno set to EIO. On my Mac, it returns 0 to indicate EOF. This means that vt_process() also returns 0 and main() never marks the client as dead. main() doesn’t destroy the client on the next iteration, and its file descriptor is added to the set for the next pselect().

Because the file descriptor is still in the fd_set and read() wouldn’t block on EOF, the pselect() returns immediately, the loop runs forever, and SIGCHLD is left unhandled indefinitely.

I’m not sure if this is the right place to fix this (maybe this difference is related to pty setup?), but I made some small changes to vt_process() and main() so that the client is marked dead when it gets to EOF. Everything seems to be working correctly on both my machines. You can see my changes here: https://github.com/kendallschmit/dvtm/commit/233244fbe854f86ce05d4c697855aca35309cccc

Adding |WUNTRACED works for me as well 😺

I spent some time this morning trying to understand why we end up in a SIGSTOP to no avail.

However browsing the source of tmux and screen it appears both pass waitpid options of WNOHANG or WUNTRACED. Indeed with the following diff I don’t hit this issue.

diff --git a/dvtm.c b/dvtm.c
index 2b7ebdc..1248a59 100644
--- a/dvtm.c
+++ b/dvtm.c
@@ -702,7 +702,7 @@ sigchld_handler(int sig) {
        int status;
        pid_t pid;
 
-       while ((pid = waitpid(-1, &status, WNOHANG)) != 0) {
+       while ((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) != 0) {
                if (pid == -1) {
                        if (errno == ECHILD) {
                                /* no more child processes */

Edit: Reading the tmux and screen sources beyond a simple grep I now see they explicitly continue processes if they are stopped. Indeed with the above diff the shell is lost if stopped when signaled via kill for example, whereas in tmux or screen it is not. So I’ll need to add a bit more logic than the overly simple diff above.

Example of where this was introduced in tmux https://github.com/tmux/tmux/commit/62d2ab3e687bfc7e0a02adedee30314b8ef1b08b

Yes, as mentioned in my first response it is related to the handling of SIGCHLD.

I can give you an overview on how it is supposed to work. Conceptually we want to block and be notified when either a signal occurs (SIGCHLD a client process terminated, SIGWINCH the terminal was resized) or I/O is available either from standard input (user keyboard input), a client process, the statusbar FIFO etc.

This is realized as follows:

  1. block SIGCHLD/SIGWINCH signals
  2. check whether a client has died, if so remove it
  3. call pselect with an empty signalmask which atomically
    • unblocks all signals, hence if a SIGCHLD/SIGWINCH was pending we will now receive it
    • perform the select, block, if a signal occurs select will return -1 with errno set to EINTR (this is crucial, does it happen on Mac OS X?)
    • restore the signal mask i.e. once again block SIGCHLD/SIGWINCH
  4. redraw if necessary
  5. goto 1

See this lwn.net article on why pselect is necessary. However the mentioned race condition should not cause an infinite hang. In the worst case we would block until some other event occurs. For example creating a new window should then remove the old one.

Another thing to investigate is whether the signal handler actually gets called, but for some reason the died property of the client is not set.

When debugging these kind of issues it is extremely useful to get a trace at the syscall layer, as produced by strace(1) on Linux.

Another option is to try to come up with a minimal reproducible test case. How does the following program behave? Run it, then use kill -WINCH <pid> from another terminal, it should print a message related to the signal. Now press c<Enter> this should fork a new process and print another message when it dies.

#define _DARWIN_C_SOURCE
#define _POSIX_C_SOURCE 200809L
#define _XOPEN_SOURCE 700
#define _XOPEN_SOURCE_EXTENDED

#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
#include <stdbool.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/select.h>

volatile sig_atomic_t died = false;
volatile sig_atomic_t resized = false;

void sigchld_handler(int sig) {
    int status;
    pid_t pid;
    printf("SIGCHLD handler\n");
    while ((pid = waitpid(-1, &status, WNOHANG)) != 0) {
        if (pid == -1)
            break;
        printf("child with pid %d died\n", pid);
    }
    died = true;
}

void sigwinch_handler(int sig) {
    printf("SIGWINCH handler\n");
    resized = true;
}

int main(int argc, char *argv[]) {
    struct sigaction sa;
    memset(&sa, 0, sizeof sa);
    sa.sa_flags = 0;
    sigemptyset(&sa.sa_mask);
    sa.sa_handler = sigchld_handler;
    sigaction(SIGCHLD, &sa, NULL);
    sa.sa_handler = sigwinch_handler;
    sigaction(SIGWINCH, &sa, NULL);
    sigset_t emptyset, blockset;
    sigemptyset(&emptyset);
    sigemptyset(&blockset);
    sigaddset(&blockset, SIGWINCH);
    sigaddset(&blockset, SIGCHLD);
    sigprocmask(SIG_BLOCK, &blockset, NULL);

    printf("parent pid: %d\n", getpid());

    for (;;) {
        int r, nfds = 0;
        fd_set rd;
        FD_ZERO(&rd);
                FD_SET(STDIN_FILENO, &rd);

        if (resized) {
            printf("mainloop: need resize\n");
            resized = false;
        }

        if (died) {
            printf("mainloop: process died\n");
            exit(EXIT_SUCCESS);
        }

        r = pselect(nfds + 1, &rd, NULL, NULL, NULL, &emptyset);

        if (r == -1 && errno == EINTR)
            continue;

        if (r < 0) {
            printf("select() error\n");
            exit(EXIT_FAILURE);
        }

        if (FD_ISSET(STDIN_FILENO, &rd)) {
            char key;
            if (read(STDIN_FILENO, &key, 1) != 1) {
                printf("read error\n");
                continue;
            }
            if (key == 'c') {
                pid_t pid = fork();
                if (pid == -1) {
                    printf("fork failure\n");
                    exit(EXIT_FAILURE);
                }
                if (pid == 0) {
                    sleep(2);
                    exit(EXIT_SUCCESS);
                }
                printf("child with pid %d forked\n", pid);
            } else if (key == 'q') {
                exit(EXIT_SUCCESS);
            } else if (key != '\n') {
                printf("key: %c\n", key);
            }
        }
    }

    return 0;
}