podman: [Bug]: podman pod create command hangs indefinitely

Issue Description

podman pod create command hangs indefinitely and causes all other podman commands to hang.

Steps to reproduce the issue

  1. Create a podman network.
podman network create --ipv6
  1. Create a pod.
podman pod create \
  --name miniflux \
  --network podman1 \
  --replace \
  --userns keep-id

Describe the results you received

The podman pod create command hangs indefinitely and any podman commands such as podman ps hang while the podman pod create command is hanging.

Describe the results you expected

I expected podman pod create to finish executing almost immediately but within at most a few minutes.

podman info output

host:
  arch: arm64
  buildahVersion: 1.28.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.5-1.fc37.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: '
  cpuUtilization:
    idlePercent: 86.4
    systemPercent: 6.25
    userPercent: 7.36
  cpus: 6
  distribution:
    distribution: fedora
    variant: iot
    version: "37"
  eventLogger: journald
  hostname: rockpro64.jwillikers.io
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.1.6-200.fc37.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 852598784
  memTotal: 3994365952
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.7.2-3.fc37.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.7.2
      commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.aarch64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 2858676224
  swapTotal: 3994021888
  uptime: 53h 18m 24.00s (Approximately 2.21 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/jordan/.config/containers/storage.conf
  containerStore:
    number: 12
    paused: 0
    running: 11
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/jordan/.local/share/containers/storage
  graphRootAllocated: 123364966400
  graphRootUsed: 80927322112
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 15
  runRoot: /run/user/1000/containers
  volumePath: /home/jordan/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1668178831
  BuiltTime: Fri Nov 11 09:00:31 2022
  GitCommit: ""
  GoVersion: go1.19.2
  Os: linux
  OsArch: linux/arm64
  Version: 4.3.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

RockPro64 / aarch64

Fedora IoT 35 - 37

Additional information

Backing storage for containers is running off of an NFS mounted volume and an S3 mounted volume, both of which are mounted via Linux directly. Several containers and on pod are running on the system, managed via systemd, without problems.

I think this may be related to #10269.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 31 (23 by maintainers)

Most upvoted comments

OK. Probably a lock conflict, then. We added some detection around that (ErrWillDeadlock gets thrown in some places), but it seems like pod creation and pod removal don’t have that.

Seems that this needs still some more work to prevent this issue to arise.

But it’s for @giuseppe @Luap99 and @mheon to state.

thanks. So the NFS mount is used only for volumes

OK. Probably a lock conflict, then. We added some detection around that (ErrWillDeadlock gets thrown in some places), but it seems like pod creation and pod removal don’t have that.