podman: pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc"

Seen in CI, f37 rootless sqlite:

podman pod stats on a specific running pod
...
$ podman [options] run --http-proxy=false --pod cec7af2e2f8e0479a91d94bff88cf8482c2907f7644eb7c90a9ea01b2f13ff22 -d quay.io/libpod/alpine:latest top
eed6053f6f0e8a2f883464f994bb30d09304274e1b1e1cdb2eb52a0f49ae3985
$ podman [options] pod stats --no-stream cec7af2e2f8e0479a91d94bff88cf8482c2907f7644eb7c90a9ea01b2f13ff22
Error: unknown FS magic on "/run/user/4902/netns/netns-a2804d97-802b-9e57-2e40-11c1206c102c": 1021994
[AfterEach] Podman pod stats
  /var/tmp/go/src/github.com[/containers/podman/test/e2e/pod_stats_test.go:33](https://github.com/containers/podman/blob/956677a741cdcce627dda4336f85c8fc0be83a5c/test/e2e/pod_stats_test.go#L33)
$ podman [options] pod rm -fa -t 0
time="2023-03-23T15:11:45-05:00" level=error msg="Unable to clean up network for container 652344ca5642edd24b93f810fcb9ebbb1c0969195a67454bc05b8a67e7ed3185: \"unmounting network namespace for container 652344ca5642edd24b93f810fcb9ebbb1c0969195a67454bc05b8a67e7ed3185: failed to unmount NS: at /run/user/4902/netns/netns-a2804d97-802b-9e57-2e40-11c1206c102c: invalid argument\""
         cec7af2e2f8e0479a91d94bff88cf8482c2907f7644eb7c90a9ea01b2f13ff22

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, --this --that was a good hint: I now understand were it goes wrong, here is a reproducer:

First make sure you have no podman processes or containers running running, then kill the pause process to start from a clean system:

$ pkill "podman pause"
$ dir=$(mktemp -d)
$ podman --root $dir/root --runroot $dir/runroot --tmpdir $dir/tmp  run -d --name test alpine top
# Now this process will create a pause.pid file but not under $XDG_RUNTIME_DIR/libpod/tmp/pause.pid,
# instead it uses the --tmpdir value so $dir/tmp/pause.pid
# If you then use a regular podman command without the custom options:
$ podman run -d --name test alpine top
$ pgrep "podman pause"
18786
19198
# The normal command had no clue about the pause process so it created its own, so far no problem with that.
# The problem happens when we now use the command with the options again:
$ podman --root $dir/root --runroot $dir/runroot --tmpdir $dir/tmp  stop test
ERRO[0000] Unable to clean up network for container 6fc14126834157223df6318300affcce613b0a3acc0cdeec5e3e8df55a1335c1: "unmounting network namespace for container 6fc14126834157223df6318300affcce613b0a3acc0cdeec5e3e8df55a1335c1: failed to unmount NS: at /run/user/1000/netns/netns-34ab0c13-c7e0-6c3e-2f98-32d2f67bb4e1: invalid argument"

The bug here is that there is a shortcut in pkg/rootless/rootless_linux.c which is always run before any go code is run (including the option parsing). The c code just sees $XDG_RUNTIME_DIR/libpod/tmp/pause.pid and imminently joins this namespace from this process. This shortcut is only there to join, so if the process does not exits it will do nothing and let podman handle the namespace and pause process creation.

So that is why you have to run the first time with --tmpdir before the pause process p pid existed at $XDG_RUNTIME_DIR/libpod/tmp/pause.pid . If you do it the other way around even the process with --tmpdir it would have joined the namespace via the shortcut so it would use the same one and not cause issues. See the possibility for flakes here?

The fix for this is of course to not do the shortcut when we see --tmpdir so the podman go code can handle it, Note that we already special case other commands: https://github.com/containers/podman/blob/ac1d297fc76f4423d6f44b98c864476cbeffce86/pkg/rootless/rootless_linux.c#L378-L384