runc: runc create hangs when passed a pipe for stdout or stderr

I found this by playing with various options in golang cmd := exec.Command("runc") and changing cmd.Stdout, etc. to try and capture output from a container run. It works if I make stdout os.Stdout or an actual file, but anything else causes it to hang.

For example:

  • cmd.Stdout = io.MultiWriter(os.Stdout)
  • cmd.Stdout = io.MultiWriter(file)
  • pipe := cmd.StdoutPipe() - hangs when you do io.Copy(os.Stdout, pipe)

A sample with this is here

But it turns out I can recreate the problem simply:

$ runc create echo | cat

The above hangs.

Is this because of some strange pty/tty handling? How would I work around it if I need to capture stdout from a runc create and possibly send it to other areas? E.g. runc create echo | tree /tmp/foo

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 29 (27 by maintainers)

Most upvoted comments

@deitch [I am currently on vacation, so sorry for the brief response.]

I believe what you wrote is effectively correct, but I’ll read through it again when I have some more time.

Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) “basically booting a separate operating system” means you expect a terminal as a console, and that terminal doesn’t really have separate stderr/stdout or pipes to/from stdio, just a console.

Effectively yes. When you open a new shell on your machine, you’ll find that /proc/self/fd/1 and /proc/self/fd/2 are the same PTY. You describe this as a short-coming in #1730 – I don’t disagree (and I was thinking of making them different PTYs when I first wrote the code), but from memory there is some process management voodoo that breaks if you have a non-controlling terminal as your stdio (because terminals are super magical in Unix).

@cyphar read through this a number of times. As far as I can tell, if you set terminal: true (which appears to be what you recommend), then runc will create a new pty, and, if --console-socket is provided, it will send the fd for that pty to the socket passed.

However, this does not change what the container’s process has, only allows you to “tap into it”. The net effect appears to be that I (outside process) can feed to its stdin by writing to that fd, and I can read the combined stdout/stderr but reading from that fd.

But I cannot, e.g., separate stderr from stdout (i.e. fds 1 and 2 are combined onto that passed fd). If I wanted to do that, I would need to run it foreground.

Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) “basically booting a separate operating system” means you expect a terminal as a console, and that terminal doesn’t really have separate stderr/stdout or pipes to/from stdio, just a console.

If you want to do more “regular Unix-y stdio things” (yes, I really just wrote that… 😃 ), then you need to run foreground or terminal:false. Which is fine, if not recommended, except for the oddness around it hanging, which is what started this issue.

Understanding correct?

@deitch

Is that it? Why didn’t it just let me pass to it —stdin dev —stdout dev —stderr dev instead of the whole complex “have runc create it and I have to make a place to retrieve it”?

--console-socket only applies if you have terminal: true – this means “please create the pty for me, from inside the container’s /dev/pts/... mount, during the rest of the container setup”. The “from inside the container” part is quite important, and there were many bugs (including sudo not working inside containers) that were caused by the pty not being created inside the container.

Now, you could manually do the above instead of using --console-socket (by doing a runc exec and then emulating what I just listed by opening a socket and using --preserve-fds for the container process to inherit it), but in order for Docker and other upstream users to be able to have a sane terminal setup we just added an option for it.


As for why we don’t have --stdin, --stdout, --stderr for terminal: false? Though I don’t like the flag names (and would prefer if --preserve-fds was more general purpose and didn’t have its current lets-just-copy-Go semantics), this is something that I’ve been pushing for ever since I worked on #1018 (and if you wanna go for a long read, you’ll see the discussion we had about this at the time) – because the current implementation of “just inherit the stdio fds” is:

  • Way too magical, for an already-pretty-magical part of the codebase. People get hit by this weirdness all the time, and it just confuses everyone. It also makes the code kinda complex (but the alternative is also complex).

  • Ripe for security bugs, as inheriting the stdio fds for something like your terminal can be quite dangerous (see ioctl_tty(4) and TIOCSTI). Same goes for inheriting file descriptors that refer to directories from within the host mount namespace (very unsafe).

The main arguments that are against having specific flags for stdio (or extending --preserve-fds as I mentioned) are:

  • Adding more flags to stdio handling will make it more complicated, and it’s already far too complicated. Not to mention that the flags will be useless in “non-detached” mode, making the handling even more complicated. This is quite a reasonable argument, and is the reason I suggest extending --preserve-fds (though --preserve-fds for stdio would still be more complicated in the non-detach case – since in that case runc run will use pipes for the stdio).

  • It would break the “out of the box” case for detached containers – running it in your terminal. I used to think that this argument was okay, but I’ve recently changed my mind after actually trying to imagine what a new user would see if they tried this. In short, a new user would see that their terminal has been hijacked as both the container and their shell try to read from the same terminal. This is not an out-of-the-box case that makes sense.

  • It is not clear what the default stdio should be in the detached case. There are obviously incorrect answers (create a logfile for stdout and stderr – like nohup does), but the most obvious answer (just use /dev/null) is also not without its downsides. In particular, this would mean that the “out of the box” case for detached containers that have a shell as their pid1 will exit prematurely (shells don’t like their stdin being /dev/null). While this all makes sense if you understand what’s going on – most new users won’t know why their container didn’t survive for more than a split second. The other option would be to simply not let users start a detached container without specifying stdio, which would mirror the requirements for --console-socket. I think this is nicer overall, but also makes onboarding more complicated than it was previously.

  • Our detached handling is effectively identical to & in a shell – so users should already know how things will look. While I agree runc run -d might superficially look like runc run &, I don’t think it make sense to use this similarity as an argument for keeping it.

@deitch

I couldn’t figure that out either. What gets passed? The path to a Unix domain socket, ok, but is that socket itself intended to become one of stdout or similar? How does that align with “pty will be set up by runc”?

--console-socket /path/to/unix/socket tells runc to send the file descriptor for the master end of the PTY device through the socket using SCM_CREDENTIALS (in case you’re not aware, on Unix you can send file descriptors through sockets and the receiving end then has a dup of that file descriptor). runc creates the PTY, but it sends the master to another process since you asked runc to exit (the master end needs to be open otherwise the container will get a SIGHUP or something similar, and you won’t be able to interact with the stdio – and exit will close the file descriptor that runc has for the master).

#1018 implemented --console-socket and (while it’s quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes – that also means that I’m the one to blame for this one 😉).

Why in this case does setting stdout or stderr to existing stdout/stderr or to an actual file work, but to a pipe or a buffer hang?

I am not sure, and I will take a look at this when I get a chance.

(We really need to document this somewhere other than issues, or I should come up with a canned reply to questions about terminal.)

There are three main cases that can occur with pty handling:

  • runc create or runc run -d indicates that runc will not stick around – there will be no runc code running after the container is set up. This means that runc cannot do any IO forwarding – it has to be set up by the caller. The way this restriction manifests depends on the terminal setting.
    • terminal: true means that a PTY will be set up by runc – but runc cannot exit(2) without another process having the file descriptor for the master PTY. This is done by sending the fd through the --console-socket= argument.
    • terminal: false just means that the stdio fds will be inherited by the container (whatever they are) as the PTY will not be set up. In other words, 0..2 are set as the containers stdio. Be aware that this can cause security issues if you are not very careful when you use it (assuming you’re running untrusted code in the container).
  • runc run means that runc will stick around doing IO forwarding. In this case, terminal: true and terminal: false will look similar (except for the fact that terminal: true creates a PTY and terminal: false doesn’t).

So, if you’re piping runc run -d to a file what’s actually happening is that the container process’s stdio FDs are the pipe FD (you should be very careful when doing runc run -d ctr >some_file because it gives the container indirect access to the host filesystem). Hopefully that better explains what is happening when you use terminal: ....