podman: User Podman Services (podman.service/podman.socket) fail within 24 hrs

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

User podman services (podman.socket and podman.service) fail within 24 hours of a system reboot. While user podman containers continue to run, the systemctl log shows both units as failed.

Output from podman.service journal:

Jun 07 22:50:27 local.lan systemd[1234]: Failed to start Podman API Service.
Jun 07 22:50:27 local.lan systemd[1234]: podman.service: Failed to allocate exec_fd pipe: Too many open files
Jun 07 22:50:27 local.lan systemd[1234]: podman.service: Failed to run 'start' task: Too many open files
Jun 07 22:50:27 local.lan systemd[1234]: podman.service: Failed with result 'resources'.

Output from podman.socket journal:

Jun 07 22:50:35 local.lan systemd[1234]: Listening on Podman API Socket.
Jun 07 22:50:36 local.lan systemd[1234]: podman.socket: Trigger limit hit, refusing further activation.
Jun 07 22:50:36 local.lan systemd[1234]: podman.socket: Failed with result 'trigger-limit-hit'.

Both these issues look similar to previously closed issues (https://github.com/containers/podman/issues/6093 and https://github.com/containers/podman/issues/5150) but (unless I’m reading them wrong) fixes for those issues should have been merged a while ago.

Steps to reproduce the issue:

  1. Generate a rootless container (I started ‘docker.io/thelounge/thelounge:latest’) and create a corresponding user systemctl unit.

  2. Allow to run for 24 hours.

  3. Run systemctl --user status - the system will show as degraded. If systemctl list-units --failed is run, both podman.socket and podman.service show as failed.

Describe the results you received: Podman systemd units failed.

Describe the results you expected: Podman services to continue working normally.

Additional information you deem important (e.g. issue happens only occasionally): Both appear to be online and working at system start.

Output of podman version:

podman version 3.1.0-dev

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.8
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.27-1.module_el8.5.0+733+9bb5dffa.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: dc08a6edf03cc2dadfe803eac14b896b44cc4721'
  cpus: 4
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: file
  hostname: local.lan
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-305.3.1.el8.x86_64
  linkmode: dynamic
  memFree: 13275705344
  memTotal: 16480956416
  ociRuntime:
    name: runc
    package: runc-1.0.0-70.rc92.module_el8.5.0+733+9bb5dffa.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.2-dev'
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.module_el8.5.0+733+9bb5dffa.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 3670011904
  swapTotal: 3670011904
  uptime: 30h 19m 58.74s (Approximately 1.25 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/USERNAME/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.5.0-1.module_el8.5.0+733+9bb5dffa.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.5
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/USERNAME/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  volumePath: /home/USERNAME/.local/share/containers/storage/volumes
version:
  APIVersion: 3.1.0-dev
  Built: 1616783523
  BuiltTime: Fri Mar 26 11:32:03 2021
  GitCommit: ""
  GoVersion: go1.16.1
  OsArch: linux/amd64
  Version: 3.1.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.1.0-0.13.module_el8.5.0+733+9bb5dffa.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? Checked trouble shooting guide. While not the latest version, it looks like these issues were fixed in podman 1.9.

Additional environment details (AWS, VirtualBox, physical, etc.): Physical system running Centos Stream 8.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 33 (15 by maintainers)

Most upvoted comments

Ah - I believe @jwhonce is working on FD leaks right now

Since podman 3.4 is released, we believe this is now fixed.

@jwhonce Yep - that suppresses the errors - thank you. Not sure if the errors (or the number of files that are open in the container) are relevant for the service failures, but figured it was worth including.