podman: podman hangs when sending large amounts of data on stdout

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When trying to pipe large amounts of data via stdout, podman often hangs. There was a similar issue #9183 about that before and apparently it was fixed in containers/conmon#236 but I still see an extremely similar issue here, a year later, on new versions.

Steps to reproduce the issue:

The is more or less 100% reliable hang for me:

podman run --rm docker.io/library/node:latest tar c / >/dev/null

If you want, you can also run it through pv to see how much data was written. It tends to stop after 3.23MiB for me:

bash-5.1$ podman run --rm docker.io/library/node:latest tar c / | pv >/dev/null
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets
3.23MiB 0:00:14 [0.00 B/s] [<=>                                               ]

Describe the results you received:

Both of those hang.

I’m pretty sure this isn’t caused by tar getting stuck on reading a file, since, for example podman run --rm docker.io/library/node:latest tar cf /dev/null / >/dev/null works fine and exits quickly.

Note as well that tar isn’t running anymore by the time of the hang:

...
 716226 ?        Ssl    0:04  \_ /usr/libexec/gnome-terminal-server
 716276 pts/3    Ss     0:00      \_ bash
 717560 pts/3    Sl+    0:00      |   \_ podman run --rm docker.io/library/node:latest tar c /
 717574 pts/3    Z      0:00      |       \_ [slirp4netns] <defunct>
...

The hang can be interrupted by pressing ^C. When you do that, you immediately see this message:

^C2022-04-05T13:24:41.000713801Z: open pidfd: No such process

followed exactly 5 seconds later by this:

ERRO[0094] No exit file for container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b found: timed out waiting for file /run/user/1001/libpod/tmp/exits/60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: internal libpod error 
ERRO[0094] forwarding signal 2 to container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: error sending signal to container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: `/usr/bin/crun kill 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b 2` failed: exit status 1

…followed by an exit with status 130.

Describe the results you expected:

Not hanging.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.0.0-rc4
API Version:  4.0.0-rc4
Go Version:   go1.18beta2

Built:      Fri Feb 11 15:51:09 2022
OS/Arch:    linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    variant: silverblue
    version: "36"
  eventLogger: journald
  hostname: x1
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.17.0-0.rc7.116.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 13072580608
  memTotal: 33350156288
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.2-2.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.2
      commit: f6fbc8f840df1a414f31a60953ae514fa497c748
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8000892928
  swapTotal: 8589930496
  uptime: 165h 18m 1.58s (Approximately 6.88 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/lis/.config/containers/storage.conf
  containerStore:
    number: 43
    paused: 0
    running: 20
    stopped: 23
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/lis/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/user/1001/containers
  volumePath: /var/home/lis/.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.0-rc4
  Built: 1644591069
  BuiltTime: Fri Feb 11 15:51:09 2022
  GitCommit: ""
  GoVersion: go1.18beta2
  OsArch: linux/amd64
  Version: 4.0.0-rc4

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.0.0-0.6.rc4.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Additional environment details (AWS, VirtualBox, physical, etc.):

Default install of Silverblue 36, standard issue RH ThinkPad X1 Gen 9.

Since this is potentially a bug in conmon again, the relevant versions:

bash-5.1$ conmon --version
conmon version 2.1.0
commit: 
bash-5.1$ rpm -q conmon
conmon-2.1.0-2.fc36.x86_64

About this issue

Original URL
State: open
Created 2 years ago
Comments: 36 (17 by maintainers)

Commits related to this issue

ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to lack/cnf-features-deploy by lack 2 years ago
ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to lack/cnf-features-deploy by lack 2 years ago
ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to SchSeba/cnf-features-deploy by lack 2 years ago
ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to nishant-parekh/cnf-features-deploy by lack 2 years ago
ztp: Disable podman logging when extracting container data As described in https://github.com/containers/podman/issues/13779, podman by default logs all of a container's stdout in order to retain inf... — committed to ffromani/cnf-features-deploy by lack 2 years ago

Most upvoted comments

This is not really a podman problem, the issue is the conmon crashes with segfault so it has to be fixed in conmon.

Luap99 on Jul 31, 2023

Is this still an issue, lost in the issue flod?

rhatdan on Jul 30, 2023

I fixed it by redirecting the log output into a file in my entrypoint script. (Using podman 4.5.1)

MartinX3 on Jun 27, 2023

Still an issue with 4.4.3 with a kube yaml with a bash script as entrypoint to compile aosp.

Compile aosp has much logging output to the terminal. The container just “freezes”. I can still access it with exec, but the processes just consume 0% CPU. The logs output hangs of course since there is 0% CPU usage.

MartinX3 on Mar 28, 2023

I’d vote for changing the default for --interactive mode to --log-driver=none, and allowing opt-in to interactive-mode logging by setting it to some other value.

lack on Apr 8, 2022

I can reproduce. Conmon crashes with SIGSEGV. I see the following in strace output:

24733 write(2, "[conmon:w]: Unknown error -14 Bad file descriptor\n", 50) = 50
24733 close(18)                         = 0
24733 write(2, "[conmon:w]: write_journald failed\n", 34) = 34
...
24733 sendmsg(19, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/journal/socket"}, msg_namelen=30, msg_iov=[{iov_base="MESSAGE=\350\253\341\376\377\213\0\0\0\37\204\0\0\0\0\0\221\1\0\0\0\0\0\0I\33\0\0\0\0\0\0G\320*\274\362O\200k`\232\264\1\0\0\0\0`\232\264\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=851}, {iov_base="\n", iov_len=1}, {iov_base="CONTAINER_ID_FULL=6d3a03f10916fa47418398fdc88b1d023c57bb93557027ea1d89063f08671473", iov_len=82}, {iov_base="\n", iov_len=1}, {iov_base="PRIORITY=6", iov_len=10}, {iov_base="\n", iov_len=1}, {iov_base="CONTAINER_ID=6d3a03f10916", iov_len=25}, {iov_base="\n", iov_len=1}, {iov_base="CONTAINER_NAME=strange_curie", iov_len=28}, {iov_base="\n", iov_len=1}, {iov_base="SYSLOG_IDENTIFIER=strange_curie", iov_len=31}, {iov_base="\n", iov_len=1}, {iov_base="CODE_FILE=src/ctr_logging.c", iov_len=27}, {iov_base="\n", iov_len=1}, {iov_base="CODE_LINE=323", iov_len=13}, {iov_base="\n", iov_len=1}, {iov_base="CODE_FUNC=write_journald", iov_len=24}, {iov_base="\n", iov_len=1}], msg_iovlen=18, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 1100
24733 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x1b5f000} ---
...
24733 +++ killed by SIGSEGV (core dumped) +++

It works correctly with --log-driver k8s-file instead of the journald driver.

@mheon @haircommander PTAL

Luap99 on Apr 5, 2022