podman: most podman commands as user abort with "Error: cannot re-exec process" after upgrade from 2.1.1~2 to 2.2.0~2

/kind bug

Description

most podman commands as user abort with “Error: cannot re-exec process” after upgrade from 2.1.1~2 to 2.2.0~2

Steps to reproduce the issue:

  1. upgrade deb package from 2.1.1~2 to 2.2.0~2
  2. as user execute podman version

Describe the results you received:

wuxxin@zap:~$ podman version
Error: cannot re-exec process

Describe the results you expected:

sudo podman version
Version:      2.2.0
API Version:  2.1.0
Go Version:   go1.15.2
Built:        Thu Jan  1 01:00:00 1970
OS/Arch:      linux/amd64

Additional information you deem important (e.g. issue happens only occasionally): OS: Ubuntu 20.04 Package source: deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_20.04 /

Output of sudo podman info --debug (podman info --debug aborts with ‘Error: cannot re-exec process’) :

host:
  arch: amd64
  buildahVersion: 1.18.0
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.20, commit: '
  cpus: 4
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: zap
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.0-56-generic
  linkmode: dynamic
  memFree: 4068319232
  memTotal: 16781094912
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version UNKNOWN
      commit: 3e46dd849fdf6bfa68127786e073318184641f05
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 9661575168
  swapTotal: 9661575168
  uptime: 18h 29m 29.46s (Approximately 0.75 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 1.1.0
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: zfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 20
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 0
  BuiltTime: Thu Jan  1 01:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.2
  OsArch: linux/amd64
  Version: 2.2.0

Package info (e.g. output of rpm -q podman or apt list podman):

LANG=POSIX apt list podman podman-rootless podman-plugins | grep amd64
podman-plugins/unknown,now 1.1.1~1 amd64 [installed]
podman-rootless/unknown,now 2.2.0~2 amd64 [installed]
podman/unknown 2.2.0~2 amd64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 27 (12 by maintainers)

Most upvoted comments

@rhatdan I just ran into this and the solution ended up being to rm -rf "/tmp/run-${UID}". I think a container process had been interrupted somehow, and /tmp/run-$UID/libpod/pause.pid had been left behind. I’m not sure how that leads to that specific error, but that was the third last openat call in the strace for the failing podman run. I’m guessing the pause.pid file was stale.

Here’s the last few lines of the strace. That openat(AT_FDCWD, "/proc/7376/ns/user", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) was for a PID that didn’t match any running process, so I’m guessing that’s the value that was in pause.pid

newfstatat(AT_FDCWD, "/usr/local/bin/newgidmap", 0xc0005bc788, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/newgidmap", {st_mode=S_IFREG|0755, st_size=44760, ...}, 0) = 0                
openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 11  
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=50246, si_uid=5008, si_status=0, si_utime=0, si_stime=0} ---
openat(AT_FDCWD, "/tmp/run-5008/libpod/pause.pid", O_RDONLY|O_CLOEXEC) = 11
getcwd("/home/awspruner", 4096)         = 16                                                                           
openat(AT_FDCWD, "/proc/self/cmdline", O_RDONLY) = 11   
openat(AT_FDCWD, "/proc/7376/ns/user", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

edit: Worth noting, the original reporter’s issue was probably fixed by their /tmp getting cleaned out on reboot or something.