runc: runc create hangs when passed a pipe for stdout or stderr
I found this by playing with various options in golang cmd := exec.Command("runc")
and changing cmd.Stdout
, etc. to try and capture output from a container run. It works if I make stdout os.Stdout
or an actual file
, but anything else causes it to hang.
For example:
cmd.Stdout = io.MultiWriter(os.Stdout)
cmd.Stdout = io.MultiWriter(file)
pipe := cmd.StdoutPipe()
- hangs when you doio.Copy(os.Stdout, pipe)
A sample with this is here
But it turns out I can recreate the problem simply:
$ runc create echo | cat
The above hangs.
Is this because of some strange pty/tty handling? How would I work around it if I need to capture stdout from a runc create
and possibly send it to other areas? E.g. runc create echo | tree /tmp/foo
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 29 (27 by maintainers)
@deitch [I am currently on vacation, so sorry for the brief response.]
I believe what you wrote is effectively correct, but I’ll read through it again when I have some more time.
Effectively yes. When you open a new shell on your machine, you’ll find that
/proc/self/fd/1
and/proc/self/fd/2
are the same PTY. You describe this as a short-coming in #1730 – I don’t disagree (and I was thinking of making them different PTYs when I first wrote the code), but from memory there is some process management voodoo that breaks if you have a non-controlling terminal as your stdio (because terminals are super magical in Unix).@cyphar read through this a number of times. As far as I can tell, if you set
terminal: true
(which appears to be what you recommend), thenrunc
will create a new pty, and, if--console-socket
is provided, it will send the fd for that pty to the socket passed.However, this does not change what the container’s process has, only allows you to “tap into it”. The net effect appears to be that I (outside process) can feed to its stdin by writing to that fd, and I can read the combined stdout/stderr but reading from that fd.
But I cannot, e.g., separate stderr from stdout (i.e. fds 1 and 2 are combined onto that passed fd). If I wanted to do that, I would need to run it foreground.
Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) “basically booting a separate operating system” means you expect a terminal as a console, and that terminal doesn’t really have separate stderr/stdout or pipes to/from stdio, just a console.
If you want to do more “regular Unix-y stdio things” (yes, I really just wrote that… 😃 ), then you need to run foreground or
terminal:false
. Which is fine, if not recommended, except for the oddness around it hanging, which is what started this issue.Understanding correct?
@deitch
--console-socket
only applies if you haveterminal: true
– this means “please create the pty for me, from inside the container’s/dev/pts/...
mount, during the rest of the container setup”. The “from inside the container” part is quite important, and there were many bugs (includingsudo
not working inside containers) that were caused by the pty not being created inside the container.Now, you could manually do the above instead of using
--console-socket
(by doing arunc exec
and then emulating what I just listed by opening a socket and using--preserve-fds
for the container process to inherit it), but in order for Docker and other upstream users to be able to have a sane terminal setup we just added an option for it.As for why we don’t have
--stdin
,--stdout
,--stderr
forterminal: false
? Though I don’t like the flag names (and would prefer if--preserve-fds
was more general purpose and didn’t have its current lets-just-copy-Go semantics), this is something that I’ve been pushing for ever since I worked on #1018 (and if you wanna go for a long read, you’ll see the discussion we had about this at the time) – because the current implementation of “just inherit the stdio fds” is:Way too magical, for an already-pretty-magical part of the codebase. People get hit by this weirdness all the time, and it just confuses everyone. It also makes the code kinda complex (but the alternative is also complex).
Ripe for security bugs, as inheriting the stdio fds for something like your terminal can be quite dangerous (see
ioctl_tty(4)
andTIOCSTI
). Same goes for inheriting file descriptors that refer to directories from within the host mount namespace (very unsafe).The main arguments that are against having specific flags for stdio (or extending
--preserve-fds
as I mentioned) are:Adding more flags to stdio handling will make it more complicated, and it’s already far too complicated. Not to mention that the flags will be useless in “non-detached” mode, making the handling even more complicated. This is quite a reasonable argument, and is the reason I suggest extending
--preserve-fds
(though--preserve-fds
for stdio would still be more complicated in the non-detach case – since in that caserunc run
will use pipes for the stdio).It would break the “out of the box” case for detached containers – running it in your terminal. I used to think that this argument was okay, but I’ve recently changed my mind after actually trying to imagine what a new user would see if they tried this. In short, a new user would see that their terminal has been hijacked as both the container and their shell try to read from the same terminal. This is not an out-of-the-box case that makes sense.
It is not clear what the default stdio should be in the detached case. There are obviously incorrect answers (create a logfile for stdout and stderr – like
nohup
does), but the most obvious answer (just use/dev/null
) is also not without its downsides. In particular, this would mean that the “out of the box” case for detached containers that have a shell as theirpid1
will exit prematurely (shells don’t like theirstdin
being/dev/null
). While this all makes sense if you understand what’s going on – most new users won’t know why their container didn’t survive for more than a split second. The other option would be to simply not let users start a detached container without specifying stdio, which would mirror the requirements for--console-socket
. I think this is nicer overall, but also makes onboarding more complicated than it was previously.Our detached handling is effectively identical to
&
in a shell – so users should already know how things will look. While I agreerunc run -d
might superficially look likerunc run &
, I don’t think it make sense to use this similarity as an argument for keeping it.@deitch
--console-socket /path/to/unix/socket
tells runc to send the file descriptor for the master end of the PTY device through the socket using SCM_CREDENTIALS (in case you’re not aware, on Unix you can send file descriptors through sockets and the receiving end then has adup
of that file descriptor).runc
creates the PTY, but it sends the master to another process since you askedrunc
to exit (the master end needs to be open otherwise the container will get aSIGHUP
or something similar, and you won’t be able to interact with the stdio – andexit
will close the file descriptor thatrunc
has for the master).#1018 implemented
--console-socket
and (while it’s quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes – that also means that I’m the one to blame for this one 😉).I am not sure, and I will take a look at this when I get a chance.
(We really need to document this somewhere other than issues, or I should come up with a canned reply to questions about
terminal
.)There are three main cases that can occur with pty handling:
runc create
orrunc run -d
indicates thatrunc
will not stick around – there will be norunc
code running after the container is set up. This means thatrunc
cannot do any IO forwarding – it has to be set up by the caller. The way this restriction manifests depends on the terminal setting.terminal: true
means that a PTY will be set up byrunc
– butrunc
cannotexit(2)
without another process having the file descriptor for the master PTY. This is done by sending the fd through the--console-socket=
argument.terminal: false
just means that the stdio fds will be inherited by the container (whatever they are) as the PTY will not be set up. In other words,0..2
are set as the containers stdio. Be aware that this can cause security issues if you are not very careful when you use it (assuming you’re running untrusted code in the container).runc run
means thatrunc
will stick around doing IO forwarding. In this case,terminal: true
andterminal: false
will look similar (except for the fact thatterminal: true
creates a PTY andterminal: false
doesn’t).So, if you’re piping
runc run -d
to a file what’s actually happening is that the container process’s stdio FDs are the pipe FD (you should be very careful when doingrunc run -d ctr >some_file
because it gives the container indirect access to the host filesystem). Hopefully that better explains what is happening when you useterminal: ...
.