podman: Rootless 'podman rm --force' fails with 'given PIDs did not die within timeout'

/kind bug

Description

Sometimes podman rm --force <container> fails to remove a running container that once had an active Exec session, but not anymore. Once it fails the first time, the container is marked as Exited, but podman rm --force continues to keep failing.

$ podman ps --all
CONTAINER ID  IMAGE                                             COMMAND               CREATED         STATUS                      PORTS  NAMES
025abd4217ba  registry.fedoraproject.org/f30/fedora-toolbox:30  toolbox --verbose...  27 minutes ago  Exited (143) 5 minutes ago         fedora-toolbox-30
$ podman --log-level debug rm --force fedora-toolbox-30
INFO[0000] running as rootless                          
DEBU[0000] using conmon: "/usr/libexec/podman/conmon"   
DEBU[0000] Initializing boltdb state at /var/home/rishi/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /var/home/rishi/.local/share/containers/storage 
DEBU[0000] Using run root /tmp/1000                     
DEBU[0000] Using static dir /var/home/rishi/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path /var/home/rishi/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "overlay" 
DEBU[0000] overlay: mount_program=/usr/bin/fuse-overlayfs 
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=false, usingMetacopy=false 
DEBU[0000] Initializing event backend journald          
DEBU[0000] using runtime "/usr/bin/runc"                
DEBU[0000] Setting maximum rm workers to 16             
DEBU[0000] Killing all processes in container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 with SIGTERM 
WARN[0000] no such directory for freezer.state          
WARN[0000] no such directory for freezer.state          
WARN[0010] Timed out stopping container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 exec sessions 
DEBU[0010] Killing all processes in container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 with SIGKILL 
WARN[0000] no such directory for freezer.state          
WARN[0000] no such directory for freezer.state          
DEBU[0015] Failed to remove container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7: failed to kill container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 exec sessions: given PIDs did not die within timeout 
DEBU[0015] Worker#0 finished job [(*LocalRuntime) RemoveContainers func1]/025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 (failed to kill container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 exec sessions: given PIDs did not die within timeout) 
DEBU[0015] Pool[rm, 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7: failed to kill container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 exec sessions: given PIDs did not die within timeout] 
ERRO[0015] failed to kill container 025abd4217ba14e95ceea799cee9e29a0d446b76d08765b724371c3cc3ed67d7 exec sessions: given PIDs did not die within timeout

Additional information you deem important (e.g. issue happens only occasionally):

This doesn’t happen reliably, but every once in a while, but I believe I only started seeing it with podman-1.5.0.

Output of podman version:

Version:            1.5.0
RemoteAPI Version:  1
Go Version:         go1.12.7
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.7
  podman version: 1.5.0
host:
  BuildahVersion: 1.10.1
  Conmon:
    package: podman-1.5.0-2.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.0, commit: 7e8f10c28723d67281b1dd11d5dac8edf29ca3d0-dirty'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 9611743232
  MemTotal: 16530231296
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 8414818304
  SwapTotal: 8414818304
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: bollard
  kernel: 5.2.7-200.fc30.x86_64
  os: linux
  rootless: true
  uptime: 36m 35.57s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /var/home/rishi/.config/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /var/home/rishi/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 2
  RunRoot: /tmp/1000
  VolumePath: /var/home/rishi/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):

I have only seen this happen on Fedora 30 hosts, possibly because I have only tried podman-1.5.0 on Fedora 30.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 24 (17 by maintainers)

Most upvoted comments

I can pick this one with https://github.com/containers/libpod/issues/5014. I believe to have a fix.