podman: rootless service not work with systemd socket activation

/kind bug

Description

podman system service works broken in rootless mode, not get socket from systemd user session, and a lot of noise fill in the system log because system service restart continuously.

Steps to reproduce the issue:

Run command as a non-root user, so podman run in rootless mode

  1. $ systemctl --user start podman.socket

  2. $ nc -U /run/user/$UID/podman/podman.sock

  3. $ journalctl --user -f watch the log

Describe the results you received: level=info msg="using API endpoint: 'unix:/run/user/1000/podman/podman.sock'" in the log, which mean socket activation not works, and systemd pull up podman service all the time

Describe the results you expected: we should see level=info msg="using systemd socket activation to determine API endpoint" in the log

Additional information you deem important (e.g. issue happens only occasionally): run podman system service in root mode is OK, service get unix socket correctly from systemd

Output of podman version:

Version:      3.0.0-dev
API Version:  3.0.0
Go Version:   go1.16rc1
Built:        Wed Feb  3 12:07:25 2021
OS/Arch:      linux/amd64

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.0.0-0.204.dev.gita086f60.fc34.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.): System: Fedora rawhide

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 21 (18 by maintainers)

Most upvoted comments

I dig into this bug further, and I found the root in pkg/systemd/activation.go

In rootless mode, podman fork&exec a podman process in new user-namespace, and systemd pass LISTEN_PID as pid of the parent, so this check breaks:

	if err != nil || p != os.Getpid() {
		return false
	}

But in root mode, no subprocess be forked, so socket activation works

#9855 not fix this. @giuseppe @mheon I’ll take this, now I’m overwhelming by the complex code of rootless setup

Hello @pendulm ,I tested your pull request and it is works like a charm. good job man.

In this commit 3f60dc02e36a144473c494adebde781e5dec77fa, adjusted LISTEN_PID to the child PID in reexec_in_user_namespace, but when a pause process leaves for namespace pinin(see link https://github.com/containers/podman/pull/7133) , the code path is reexec_userns_join, so we should adjusted LISTEN_PID in that function.

it seems we have a check in rootless_linux to rewrite LISTEN_PID to the current pid.

Would it be possible for you to check why it is failing?

I think this is normal, since the podman service has a 5 second timeout by default.

podman system service --timeout 5000

EDIT: Never mind, you meant that it fails to hand over the socket within that timeout ?

yep. podman exit after timeout is normal as expect: when traffic come in, systemd activate the service, and when idle time, the service shutdown.

but with the bug, thing go wrong:

  1. when traffic come in, systemd activate the service, but podman service fail to get the socket, so create the socket by it self
  2. no traffic go in the new created socket, podman service shutdown after timeout
  3. systemd found service is stopped and the original socket traffic has not been accepted, so start a new podman service again
  4. podman service will not stopping restart unless stop podman socket mannually

I think this is normal, since the podman service has a 5 second timeout by default.

podman system service --timeout 5000

EDIT: Never mind, you meant that it fails to hand over the socket within that timeout ?