podman: podman hangs when sending large amounts of data on stdout

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

When trying to pipe large amounts of data via stdout, podman often hangs. There was a similar issue #9183 about that before and apparently it was fixed in containers/conmon#236 but I still see an extremely similar issue here, a year later, on new versions.

Steps to reproduce the issue:

The is more or less 100% reliable hang for me:

podman run --rm docker.io/library/node:latest tar c / >/dev/null

If you want, you can also run it through pv to see how much data was written. It tends to stop after 3.23MiB for me:

bash-5.1$ podman run --rm docker.io/library/node:latest tar c / | pv >/dev/null
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets
3.23MiB 0:00:14 [0.00 B/s] [<=>                                               ]

Describe the results you received:

Both of those hang.

I’m pretty sure this isn’t caused by tar getting stuck on reading a file, since, for example podman run --rm docker.io/library/node:latest tar cf /dev/null / >/dev/null works fine and exits quickly.

Note as well that tar isn’t running anymore by the time of the hang:

...
 716226 ?        Ssl    0:04  \_ /usr/libexec/gnome-terminal-server
 716276 pts/3    Ss     0:00      \_ bash
 717560 pts/3    Sl+    0:00      |   \_ podman run --rm docker.io/library/node:latest tar c /
 717574 pts/3    Z      0:00      |       \_ [slirp4netns] <defunct>
...

The hang can be interrupted by pressing ^C. When you do that, you immediately see this message:

^C2022-04-05T13:24:41.000713801Z: open pidfd: No such process

followed exactly 5 seconds later by this:

ERRO[0094] No exit file for container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b found: timed out waiting for file /run/user/1001/libpod/tmp/exits/60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: internal libpod error 
ERRO[0094] forwarding signal 2 to container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: error sending signal to container 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b: `/usr/bin/crun kill 60fd89b9fad660d41b7d2d0f00c27b732aa5c77beb7cfc65db905a25ffaa332b 2` failed: exit status 1 

…followed by an exit with status 130.

Describe the results you expected:

Not hanging.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.0.0-rc4
API Version:  4.0.0-rc4
Go Version:   go1.18beta2

Built:      Fri Feb 11 15:51:09 2022
OS/Arch:    linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.24.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 8
  distribution:
    distribution: fedora
    variant: silverblue
    version: "36"
  eventLogger: journald
  hostname: x1
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.17.0-0.rc7.116.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 13072580608
  memTotal: 33350156288
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.2-2.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.2
      commit: f6fbc8f840df1a414f31a60953ae514fa497c748
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8000892928
  swapTotal: 8589930496
  uptime: 165h 18m 1.58s (Approximately 6.88 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/lis/.config/containers/storage.conf
  containerStore:
    number: 43
    paused: 0
    running: 20
    stopped: 23
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/lis/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/user/1001/containers
  volumePath: /var/home/lis/.local/share/containers/storage/volumes
version:
  APIVersion: 4.0.0-rc4
  Built: 1644591069
  BuiltTime: Fri Feb 11 15:51:09 2022
  GitCommit: ""
  GoVersion: go1.18beta2
  OsArch: linux/amd64
  Version: 4.0.0-rc4

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.0.0-0.6.rc4.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

No

Additional environment details (AWS, VirtualBox, physical, etc.):

Default install of Silverblue 36, standard issue RH ThinkPad X1 Gen 9.

Since this is potentially a bug in conmon again, the relevant versions:

bash-5.1$ conmon --version
conmon version 2.1.0
commit: 
bash-5.1$ rpm -q conmon
conmon-2.1.0-2.fc36.x86_64

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 36 (17 by maintainers)

Commits related to this issue

Most upvoted comments

This is not really a podman problem, the issue is the conmon crashes with segfault so it has to be fixed in conmon.

Is this still an issue, lost in the issue flod?

I fixed it by redirecting the log output into a file in my entrypoint script. (Using podman 4.5.1)

Still an issue with 4.4.3 with a kube yaml with a bash script as entrypoint to compile aosp.

Compile aosp has much logging output to the terminal. The container just “freezes”. I can still access it with exec, but the processes just consume 0% CPU. The logs output hangs of course since there is 0% CPU usage.

I’d vote for changing the default for --interactive mode to --log-driver=none, and allowing opt-in to interactive-mode logging by setting it to some other value.

I can reproduce. Conmon crashes with SIGSEGV. I see the following in strace output:

24733 write(2, "[conmon:w]: Unknown error -14 Bad file descriptor\n", 50) = 50
24733 close(18)                         = 0
24733 write(2, "[conmon:w]: write_journald failed\n", 34) = 34
...
24733 sendmsg(19, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/journal/socket"}, msg_namelen=30, msg_iov=[{iov_base="MESSAGE=\350\253\341\376\377\213\0\0\0\37\204\0\0\0\0\0\221\1\0\0\0\0\0\0I\33\0\0\0\0\0\0G\320*\274\362O\200k`\232\264\1\0\0\0\0`\232\264\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=851}, {iov_base="\n", iov_len=1}, {iov_base="CONTAINER_ID_FULL=6d3a03f10916fa47418398fdc88b1d023c57bb93557027ea1d89063f08671473", iov_len=82}, {iov_base="\n", iov_len=1}, {iov_base="PRIORITY=6", iov_len=10}, {iov_base="\n", iov_len=1}, {iov_base="CONTAINER_ID=6d3a03f10916", iov_len=25}, {iov_base="\n", iov_len=1}, {iov_base="CONTAINER_NAME=strange_curie", iov_len=28}, {iov_base="\n", iov_len=1}, {iov_base="SYSLOG_IDENTIFIER=strange_curie", iov_len=31}, {iov_base="\n", iov_len=1}, {iov_base="CODE_FILE=src/ctr_logging.c", iov_len=27}, {iov_base="\n", iov_len=1}, {iov_base="CODE_LINE=323", iov_len=13}, {iov_base="\n", iov_len=1}, {iov_base="CODE_FUNC=write_journald", iov_len=24}, {iov_base="\n", iov_len=1}], msg_iovlen=18, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 1100
24733 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x1b5f000} ---
...
24733 +++ killed by SIGSEGV (core dumped) +++

It works correctly with --log-driver k8s-file instead of the journald driver.

@mheon @haircommander PTAL