podman: IPv4 Default Route Does Not Propagate to Pasta Containers on Hetzner VPSes
Issue Description
When using the pasta networking back-end in Podman on a Hetzner cloud VPS, the IPv4 routing table visible to the container is missing the entry for the host system’s configured gateway server, as well as the associated default route entry. As a result of this, the container has no ability to reach the internet via IPv4. Oddly enough, the IPv6 routing table appears to be complete, and there do not seem to be any issues with IPv6 connectivity inside the container.
This issue does not occur on my home network - neither my home computer (Arch Linux) nor a Cortex A53 development board (Fedora CoreOS) exhibit this issue. But I have been able to reproduce this consistently on Hetzner VPSes, under both Arch Linux and Fedora CoreOS, as well as under Hetzner’s customised Fedora Server distribution.
There is also no such issue when using Podman’s default network… but then I don’t get the benefits of using pasta.
Steps to reproduce the issue
$ ip route
$ podman run -it --rm --network=pasta alpine sh
# ip route
# ping -c 1 8.8.8.8
Describe the results you received
The ip route call on the host system outputs the system’s routing table. On my current VPS, this is:
$ ip route
default via 172.31.1.1 dev ens3 proto static metric 100
<first 3 octets of the VPS' public IPv4 address>/24 dev ens3 proto kernel scope link src <VPS' IPv4 address> metric 100
172.31.1.1 dev ens3 proto static scope link metric 100
The ip route call inside the container outputs a table without entries for the gateway and default route, i.e.
# ip route
<first 3 octets of the VPS' public IPv4 address>/24 dev ens3 proto kernel scope link src <VPS' IPv4 address> metric 100
As a result, attempts to reach the internet using IPv4 fail. IPv6 connectivity has no issues.
# ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network unreachable
# ping -c 1 2404:e80::1337:af
PING 2404:e80::1337:af (2404:e80::1337:af): 56 data bytes
64 bytes from 2404:e80::1337:af: seq=0 ttl=255 time=257.822 ms
--- 2404:e80::1337:af ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 257.822/257.822/257.822 ms
NOTE: The <first 3 octets of the VPS' public IPv4 address> entry in the routing table is the result of an error in my static network configuration. I have left it in the above output because it demonstrates that parts of the routing table do get propagated to the container, just not the gateway and default route. In a correct configuration where that extraneous entry is not present in the host system’s routing table, the container’s routing table is just empty.
$ ip route
default via 172.31.1.1 dev ens3 proto static metric 100
172.31.1.1 dev ens3 proto static scope link metric 100
$ podman run -it --rm --network=pasta alpine sh
# ip route
<no output>
Describe the results you expected
I expected a complete routing table to be available inside the container, as is the case on my home network.
$ ip route
default via 192.168.0.1 dev end0 proto dhcp src 192.168.0.11 metric 100
192.168.0.0/24 dev end0 proto kernel scope link src 192.168.0.11 metric 100
$ podman run -it --rm --network=pasta alpine sh
# ip route show
default via 192.168.0.1 dev end0
192.168.0.0/24 dev end0 scope link src 192.168.0.11
# ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=42 time=22.281 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 22.281/22.281/22.281 ms
podman info output
$ podman info
host:
arch: amd64
buildahVersion: 1.30.0
cgroupControllers:
- cpu
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: /usr/bin/conmon is owned by conmon 1:2.1.7-1
path: /usr/bin/conmon
version: 'conmon version 2.1.7, commit: f633919178f6c8ee4fb41b848a056ec33f8d707d'
cpuUtilization:
idlePercent: 99.54
systemPercent: 0.21
userPercent: 0.25
cpus: 1
databaseBackend: boltdb
distribution:
distribution: arch
version: unknown
eventLogger: journald
hostname: neoninteger-test-server
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 6.3.1-arch2-1
linkmode: dynamic
logDriver: journald
memFree: 1747542016
memTotal: 2017873920
networkBackend: netavark
ociRuntime:
name: crun
package: /usr/bin/crun is owned by crun 1.8.3-2
path: /usr/bin/crun
version: |-
crun version 1.8.3
commit: 59f2beb7efb0d35611d5818fd0311883676f6f7e
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /etc/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1
version: |-
slirp4netns version 1.2.0
commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.4
swapFree: 0
swapTotal: 0
uptime: 1h 3m 54.00s (Approximately 0.04 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries: {}
store:
configFile: /home/neoninteger/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/neoninteger/.local/share/containers/storage
graphRootAllocated: 19977711616
graphRootUsed: 2542403584
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 1
runRoot: /run/user/1000/containers
transientStore: false
volumePath: /home/neoninteger/.local/share/containers/storage/volumes
version:
APIVersion: 4.5.0
Built: 1681856273
BuiltTime: Wed Apr 19 07:47:53 2023
GitCommit: 75e3c12579d391b81d871fd1cded6cf0d043550a-dirty
GoVersion: go1.20.3
Os: linux
OsArch: linux/amd64
Version: 4.5.0
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Reproduced in the following environments:
- Hetzner CPX21 running Fedora CoreOS (this server is now busy with something else and cannot be used for further testing)
- Hetzner CX11 running Fedora Server with Hetzner customisations (e.g. networking configured using
cloud-init) - Hetzner CX11 running Arch Linux (current deployment)
The CoreOS and Arch instances were custom OS deployments that did not use Hetzner’s cloud-init system. In these systems, I tested both DHCP and manual IPv4 configuration, the issue occurs in both configurations.
The issue does not occur on my home network, with either of the following devices:
- MacBookPro11,1 (Arch Linux)
- Hardkernel ODROID-C4 (Fedora CoreOS, aarch64 variant)
Additional information
It should be noted that the routing table being incomplete seems to be the only issue here. If I manually add the rules from the host system, IPv4 connectivity within the container seems to work.
$ podman run -it --rm --network=pasta --cap-add=NET_ADMIN alpine sh
# ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network unreachable
# ip route add 172.31.1.1 dev ens3
# ip route add default via 172.31.1.1
# ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=42 time=7.861 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 7.861/7.861/7.861 ms
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (6 by maintainers)
Commits related to this issue
- netlink: Add functionality to copy routes from outer namespace Instead of just fetching the default gateway and configuring a single equivalent route in the target namespace, on 'pasta --config-net',... — committed to AkihiroSuda/passt-mirror by sbrivio-rh a year ago
- conf, pasta: With --config-net, copy all routes by default Use the newly-introduced NL_DUP mode for nl_route() to copy all the routes associated to the template interface in the outer namespace, unle... — committed to AkihiroSuda/passt-mirror by sbrivio-rh a year ago
Thanks! And we finally have a version with the changes,
2023_06_03.429e1a7(not yet in Arch Linux, in testing for Fedora 38).@sbrivio-rh Did some basic testing on HEAD and seems to work. Thank you! 👍
I’ve just spent some time testing the patch series on the mailing list with the additional patch provided in the above comments on both my home network and VPS. I’ve tried models of all of the scenarios I envision using in the near-future, including:
As far as I can tell, all of the
pastafunctionality I intend to use works.I don’t know whether David’s thoughts will end up changing anything or not. In case further revision of the patch series occurs, I’ll stay subscribed to the mailing list and test any new relevant proposals as time allows.
Thank you for taking the time to work on this.
pastais proving to be an excellent solution for container networking and I look forward to continuing its use.If any Podman maintainers wish to close this issue early, then please do so. If not, I’ll close it myself once a relevant patch series has been applied and is being made available by distribution package repositories.
Running Podman 4.5.0 with passt 96f8d55c4f5093fa59c168361c0428b53b6d2d06 with the patchset and the following additional patch applied (new server, but weirdly enough I got the same IP address again):
Works! Thank you very much @sbrivio-rh ❤️
I just posted a series that should fix this by optionally copying all the routes (and addresses) associated to the selected interface on
pasta --config-net(enabled by default).It turned out that it’s actually simpler to do that, rather than trying to figure out if we can happily copy a single route or if we should copy more than one.
I haven’t tried particularly hard to replicate the problematic setup, though, so testing and feedback would be very appreciated. Thanks!
I would keep it here to avoid unnecessary indirection.
Groan.
Right. I wonder why nowadays there seems to be an expectation for that kind of configuration to be in any way sane. In my opinion it spectacularly clashes with RFC 791 section 2.2:
(emphasis mine). Whatever, we can’t just go and “fix” that in all the possible environments.
Unless there are other preferences, I would add a workaround that configures, at least for those cases, or if there’s a default route for our outbound interface without gateway, an equivalent route in the detached namespace. In your case, that would be the
172.31.1.1 dev ens3 scope linkjust likedhclientsets it up. We need to modifynl_route()innetlink.ca bit to support gateway-less routes (but we’ll need that anyway to cover this case).Feel free to send a patch, or I might get to it later today (presumably tomorrow for you).