podman: podman hangs when sending large amounts of data on stdout
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When trying to pipe large amounts of data via stdout, podman often hangs. There was a similar issue #9183 about that before and apparently it was fixed in containers/conmon#236 but I still see an extremely similar issue here, a year later, on new versions.
Steps to reproduce the issue:
The is more or less 100% reliable hang for me:
podman run --rm docker.io/library/node:latest tar c / >/dev/null
If you want, you can also run it through pv to see how much data was written. It tends to stop after 3.23MiB for me:
bash-5.1$ podman run --rm docker.io/library/node:latest tar c / | pv >/dev/null
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets
3.23MiB 0:00:14 [0.00 B/s] [<=> ]
Describe the results you received:
Both of those hang.
I’m pretty sure this isn’t caused by tar getting stuck on reading a file, since, for example podman run --rm docker.io/library/node:latest tar cf /dev/null / >/dev/null works fine and exits quickly.
Note as well that tar isn’t running anymore by the time of the hang:
...
716226 ? Ssl 0:04 \_ /usr/libexec/gnome-terminal-server
716276 pts/3 Ss 0:00 \_ bash
717560 pts/3 Sl+ 0:00 | \_ podman run --rm docker.io/library/node:latest tar c /
717574 pts/3 Z 0:00 | \_ [slirp4netns] <defunct>
...
The hang can be interrupted by pressing ^C. When you do that, you immediately see this message:
^C2022-04-05T13:24:41.000713801Z: open pidfd: No such process
followed exactly 5 seconds later by this:
ERRO[0094] No exit file for container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b found: timed out waiting for file /run/user/1001/libpod/tmp/exits/60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: internal libpod error
ERRO[0094] forwarding signal 2 to container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: error sending signal to container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: `/usr/bin/crun kill 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b 2` failed: exit status 1
…followed by an exit with status 130.
Describe the results you expected:
Not hanging.
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version:
Client: Podman Engine
Version: 4.0.0-rc4
API Version: 4.0.0-rc4
Go Version: go1.18beta2
Built: Fri Feb 11 15:51:09 2022
OS/Arch: linux/amd64
Output of podman info --debug:
host:
arch: amd64
buildahVersion: 1.24.0
cgroupControllers:
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.0-2.fc36.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.0, commit: '
cpus: 8
distribution:
distribution: fedora
variant: silverblue
version: "36"
eventLogger: journald
hostname: x1
idMappings:
gidmap:
- container_id: 0
host_id: 1001
size: 1
- container_id: 1
host_id: 165536
size: 65536
uidmap:
- container_id: 0
host_id: 1001
size: 1
- container_id: 1
host_id: 165536
size: 65536
kernel: 5.17.0-0.rc7.116.fc36.x86_64
linkmode: dynamic
logDriver: journald
memFree: 13072580608
memTotal: 33350156288
networkBackend: netavark
ociRuntime:
name: crun
package: crun-1.4.2-2.fc36.x86_64
path: /usr/bin/crun
version: |-
crun version 1.4.2
commit: f6fbc8f840df1a414f31a60953ae514fa497c748
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
path: /run/user/1001/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
version: |-
slirp4netns version 1.2.0-beta.0
commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
libslirp: 4.6.1
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.3
swapFree: 8000892928
swapTotal: 8589930496
uptime: 165h 18m 1.58s (Approximately 6.88 days)
plugins:
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /var/home/lis/.config/containers/storage.conf
containerStore:
number: 43
paused: 0
running: 20
stopped: 23
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/home/lis/.local/share/containers/storage
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 8
runRoot: /run/user/1001/containers
volumePath: /var/home/lis/.local/share/containers/storage/volumes
version:
APIVersion: 4.0.0-rc4
Built: 1644591069
BuiltTime: Fri Feb 11 15:51:09 2022
GitCommit: ""
GoVersion: go1.18beta2
OsArch: linux/amd64
Version: 4.0.0-rc4
Package info (e.g. output of rpm -q podman or apt list podman):
podman-4.0.0-0.6.rc4.fc36.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
No
Additional environment details (AWS, VirtualBox, physical, etc.):
Default install of Silverblue 36, standard issue RH ThinkPad X1 Gen 9.
Since this is potentially a bug in conmon again, the relevant versions:
bash-5.1$ conmon --version
conmon version 2.1.0
commit:
bash-5.1$ rpm -q conmon
conmon-2.1.0-2.fc36.x86_64
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 36 (17 by maintainers)
Commits related to this issue
- ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to lack/cnf-features-deploy by lack 2 years ago
- ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to lack/cnf-features-deploy by lack 2 years ago
- ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to SchSeba/cnf-features-deploy by lack 2 years ago
- ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to nishant-parekh/cnf-features-deploy by lack 2 years ago
- ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to ffromani/cnf-features-deploy by lack 2 years ago
This is not really a podman problem, the issue is the conmon crashes with segfault so it has to be fixed in conmon.
Is this still an issue, lost in the issue flod?
I fixed it by redirecting the log output into a file in my entrypoint script. (Using podman 4.5.1)
Still an issue with 4.4.3 with a kube yaml with a bash script as entrypoint to compile aosp.
Compile aosp has much logging output to the terminal. The container just “freezes”. I can still access it with
exec, but the processes just consume 0% CPU. Thelogsoutput hangs of course since there is 0% CPU usage.I’d vote for changing the default for
--interactivemode to--log-driver=none, and allowing opt-in to interactive-mode logging by setting it to some other value.I can reproduce. Conmon crashes with SIGSEGV. I see the following in strace output:
It works correctly with
--log-driver k8s-fileinstead of the journald driver.@mheon @haircommander PTAL