podman: systemd sd-notify freezes podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Creating a service using Type=notify freezes everything until the container is ready.

Setting a service as Type=notify sets the NOTIFY_SOCKET, which gets passed through podman properly. Runc and Crun then proxy that NOTIFY_SOCKET through to the container, indicating the container will signal when ready. The whole idea being that starting a container does not equal “a container is ready”, this initialization could take seconds, 10’s of seconds, or minutes. And it shouldn’t matter how long it takes…

The problem is while it’s in “starting” status, podman is frozen completely. podman ps doesn’t return, podman exec doesn’t work, even podman info won’t return. One partially initialized container shouldn’t freeze everything, and the lack of exec makes it hard to diagnose what’s going on inside the container to resolve the sd-notify issue. podman stop and podman kill appear to work, but the container is still stuck.

In addition, the MAINPID isn’t set right - but we’ll come back to that.

Steps to reproduce the issue:

  1. Create a systemd service. Change type from Type=forking to Type=notify, and remove the PIDFile. Add NotifyAccess=all. SSCE
[Unit]
Description=NotifyTesting
Wants=network.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
SyslogIdentifier=%N
ExecStartPre=-/usr/bin/podman stop %N
ExecStartPre=-/usr/bin/podman rm %N
LogExtraFields=CONTAINER_NAME=%N
ExecStart=/usr/bin/podman --log-level=debug run \
  -d --log-driver=journald \
  --init \
  --cgroups no-conmon \
  --net=host \
  --name %N \
  alpine sleep infinity
ExecStop=/usr/bin/podman stop -t 20 %N
Type=notify
NotifyAccess=all
Restart=on-failure
#Restart=always
RestartSec=30s
TimeoutStartSec=20
TimeoutStopSec=25
#KillMode=none
#Type=forking
#PIDFile=/run/podman-pid-%n
Delegate=yes
Slice=machine.slice

[Install]
WantedBy=multi-user.target default.target
  1. Run the service.

  2. Note the status:

notifytest[8595]: time="2020-06-19T13:22:38Z" level=debug msg="Starting container e6043f58bcd610d1e448739f2120447f2880c9b498c65fc3c181e1f453a48ef7 with command

It never gets to started (as expected)

Status:

 Main PID: 12003 (podman)
    Tasks: 22 (limit: 4915)
   Memory: 27.9M
   CGroup: /machine.slice/notifytest.service
           ├─12003 /usr/share/gocode/src/github.com/containers/libpod/bin/podman --log-level=debug run -d --log-driver=journald --init --cgroups no-conmon --net=host --name notifytest alpine sleep infinity
           ├─12065 /usr/libexec/podman/conmon --api-version 1 -c a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -u a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -r /usr/bin/runc -b /va>
           └─12084 /usr/bin/runc start a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0
  1. Try podman ps, podman exec -it -l sh, or podman info

  2. Freezes.

  3. Release the container for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done

  4. Everything releases, but the service STILL fails, because it didn’t find MAINPID=conmon’s pid.

Output of podman version:

Version:      2.0.0-dev
API Version:  1
Go Version:   go1.13.3
Git Commit:   b27df834c18b08bb68172fa5bd5fd12a5cd54633
Built:        Thu Jun 18 12:19:01 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18-dev, commit: 954b05a7908c0aeeff007ebd19ff662e20e5f46f'
  cpus: 4
  distribution:
    distribution: photon
    version: "3.0"
  eventLogger: file
  hostname: photon-machine
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.19.115-6.ph3-esx
  linkmode: dynamic
  memFree: 5536968704
  memTotal: 8359960576
  ociRuntime:
    name: runc
    package: runc-1.0.0.rc9-2.ph3.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10+dev
      commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2
      spec: 1.0.2-dev
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 25h 9m 27.25s (Approximately 1.04 days)
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 0
    stopped: 4
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 16
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 1
  Built: 1592482741
  BuiltTime: Thu Jun 18 12:19:01 2020
  GitCommit: b27df834c18b08bb68172fa5bd5fd12a5cd54633
  GoVersion: go1.13.3
  OsArch: linux/amd64
  Version: 2.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

Compiled from source.

Additional environment details (AWS, VirtualBox, physical, etc.):

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 55 (46 by maintainers)

Most upvoted comments

What’s the design? What’s the guiding principles?

In my mind:

  1. One container should never influence operations on another container.
  2. One container’s startup should never freeze podman. Especially for non-container-specific operations like podman info

If these principles are incorrect, please let me know. If these principles are correct, then the current SD_NOTIFY support violates both. And I HAVE provided a solution. So the only questions are

  1. Is my solution the right direction?
  2. Are you going to merge them?