podman: systemd sd-notify freezes podman
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
Creating a service using Type=notify freezes everything until the container is ready.
Setting a service as Type=notify sets the NOTIFY_SOCKET, which gets passed through podman properly. Runc and Crun then proxy that NOTIFY_SOCKET through to the container, indicating the container will signal when ready. The whole idea being that starting a container does not equal “a container is ready”, this initialization could take seconds, 10’s of seconds, or minutes. And it shouldn’t matter how long it takes…
The problem is while it’s in “starting” status, podman is frozen completely. podman ps doesn’t return, podman exec doesn’t work, even podman info won’t return. One partially initialized container shouldn’t freeze everything, and the lack of exec makes it hard to diagnose what’s going on inside the container to resolve the sd-notify issue. podman stop and podman kill appear to work, but the container is still stuck.
In addition, the MAINPID isn’t set right - but we’ll come back to that.
Steps to reproduce the issue:
- Create a systemd service. Change type from Type=forking to Type=notify, and remove the PIDFile. Add NotifyAccess=all. SSCE
[Unit]
Description=NotifyTesting
Wants=network.target
After=network-online.target
[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
SyslogIdentifier=%N
ExecStartPre=-/usr/bin/podman stop %N
ExecStartPre=-/usr/bin/podman rm %N
LogExtraFields=CONTAINER_NAME=%N
ExecStart=/usr/bin/podman --log-level=debug run \
-d --log-driver=journald \
--init \
--cgroups no-conmon \
--net=host \
--name %N \
alpine sleep infinity
ExecStop=/usr/bin/podman stop -t 20 %N
Type=notify
NotifyAccess=all
Restart=on-failure
#Restart=always
RestartSec=30s
TimeoutStartSec=20
TimeoutStopSec=25
#KillMode=none
#Type=forking
#PIDFile=/run/podman-pid-%n
Delegate=yes
Slice=machine.slice
[Install]
WantedBy=multi-user.target default.target
-
Run the service.
-
Note the status:
notifytest[8595]: time="2020-06-19T13:22:38Z" level=debug msg="Starting container e6043f58bcd610d1e448739f2120447f2880c9b498c65fc3c181e1f453a48ef7 with command
It never gets to started (as expected)
Status:
Main PID: 12003 (podman)
Tasks: 22 (limit: 4915)
Memory: 27.9M
CGroup: /machine.slice/notifytest.service
├─12003 /usr/share/gocode/src/github.com/containers/libpod/bin/podman --log-level=debug run -d --log-driver=journald --init --cgroups no-conmon --net=host --name notifytest alpine sleep infinity
├─12065 /usr/libexec/podman/conmon --api-version 1 -c a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -u a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -r /usr/bin/runc -b /va>
└─12084 /usr/bin/runc start a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0
-
Try podman ps, podman exec -it -l sh, or podman info
-
Freezes.
-
Release the container
for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done -
Everything releases, but the service STILL fails, because it didn’t find MAINPID=conmon’s pid.
Output of podman version:
Version: 2.0.0-dev
API Version: 1
Go Version: go1.13.3
Git Commit: b27df834c18b08bb68172fa5bd5fd12a5cd54633
Built: Thu Jun 18 12:19:01 2020
OS/Arch: linux/amd64
Output of podman info --debug:
host:
arch: amd64
buildahVersion: 1.15.0
cgroupVersion: v1
conmon:
package: Unknown
path: /usr/libexec/podman/conmon
version: 'conmon version 2.0.18-dev, commit: 954b05a7908c0aeeff007ebd19ff662e20e5f46f'
cpus: 4
distribution:
distribution: photon
version: "3.0"
eventLogger: file
hostname: photon-machine
idMappings:
gidmap: null
uidmap: null
kernel: 4.19.115-6.ph3-esx
linkmode: dynamic
memFree: 5536968704
memTotal: 8359960576
ociRuntime:
name: runc
package: runc-1.0.0.rc9-2.ph3.x86_64
path: /usr/bin/runc
version: |-
runc version 1.0.0-rc10+dev
commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2
spec: 1.0.2-dev
os: linux
remoteSocket:
path: /run/podman/podman.sock
rootless: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 0
swapTotal: 0
uptime: 25h 9m 27.25s (Approximately 1.04 days)
registries:
search:
- docker.io
- registry.fedoraproject.org
- registry.access.redhat.com
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 4
paused: 0
running: 0
stopped: 4
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageStore:
number: 16
runRoot: /var/run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 1
Built: 1592482741
BuiltTime: Thu Jun 18 12:19:01 2020
GitCommit: b27df834c18b08bb68172fa5bd5fd12a5cd54633
GoVersion: go1.13.3
OsArch: linux/amd64
Version: 2.0.0-dev
Package info (e.g. output of rpm -q podman or apt list podman):
Compiled from source.
Additional environment details (AWS, VirtualBox, physical, etc.):
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 55 (46 by maintainers)
What’s the design? What’s the guiding principles?
In my mind:
podman infoIf these principles are incorrect, please let me know. If these principles are correct, then the current SD_NOTIFY support violates both. And I HAVE provided a solution. So the only questions are