colima: Network in containers breaks under bigger network load
Network breaks in containers when they start multiple network connections at the same time.
I noticed this behaviour e.g. during downloading Python dependencies. When multiple packages are downloaded at the same time I start getting Network is unreachable
error. Then when I login to the underlying QEMU machine (limactl shell colima
) I can see that it can’t reach any network address. I cannot even ping 8.8.8.8. My host computer doesn’t have any connection issues.
It gets better after few moments of inactivity. Restarting QEMU machine (colima stop && colima start
) fixes the network, but the problem comes back when I increase the network load.
This is a problem that I can consistently reproduce. I created a minimum setup to demonstrate it: https://github.com/mjkonarski-b/colima-poc
I experience that problem on multiple Macbooks, so it doesn’t seem to be related to any particular processor or macOS version:
- MBP 2021 M1 Pro with 12.1 Monterey
- MBP 2019 i7 with 12.1 Monterey
- MBP 2019 i7 with 11.5.2 BigSur
$ colima version
colima version 0.3.2
git commit: 272db4732b90390232ed9bdba955877f46a50552
runtime: docker
arch: aarch64
client: v20.10.11
server: v20.10.11
$ limactl --version
limactl version 0.8.1
$ qemu-img --version
qemu-img version 6.2.0
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 21
- Comments: 47 (18 by maintainers)
I was having this issue and was able to work around it by adding the following to
~/.lima/_config/override.yaml
Just an FYI that I have made notable progress with this.
Going with PTP based networking (thanks @elventear) minimised the dependencies required to only vde_vmnet. It then turned out easy to bundle with Colima due to its small size.
In addition to fixing this issue (hopefully finally), all VMs also get IP addresses that are reachable from the host, which then fixes https://github.com/abiosoft/colima/issues/189, https://github.com/abiosoft/colima/issues/97, https://github.com/abiosoft/colima/issues/71 and provides a workaround for https://github.com/abiosoft/colima/issues/135.
Hi, Same issue here on 5 different MBP machines.
When pulling multiple images at the same time with
docker-compose
the network breaks and I get unreachable error or i/o timeout.Great if the problem could be addressed soon.
I did more investigation, but I couldn’t find the root cause. So far it seems that the problem lies in Lima or QEMU itself. I could reproduce it on machines running raw Lima images, without Colima. I found two issues in Lima repo that seem describing the very same problem: https://github.com/lima-vm/lima/issues/537 https://github.com/lima-vm/lima/issues/561
@jasoncodes launchd is used mainly to keep it as background running process. I can borrow from the approach used by Lima or find a way to tie it to the qemu process.
Thanks, your feedbacks have been helpful.
Yes, I’m more than happy to test any development branches you may have. Looking forward to having a release with built-in support for VDE networking. Thanks for your great work. 😃
Aside: Is there a documented uninstall process anywhere? Prior to this a
colima delete
on all profiles (followed by abrew uninstall
) would clean everything up. Now we also have/opt/colima
which is not automatically removed. Might be worth adding something to the README?I just gave this a go with HEAD-5e2e413 and initially got the following output during
colima start
:~/.colima/network/vmnet.stderr
contained the following:After reviewing the generated
~/Library/LaunchAgents/com.abiosoft.colima.colima.plist
file, I created/etc/sudoers.d/colima
with the following:colima start
now runs cleanly. lima0 is setup as 192.168.106.2 and is the default IPv4 route. Outbound TCP and ICMP are working well.Edit: See https://github.com/abiosoft/colima/issues/140#issuecomment-1073375002. I had a custom
/etc/sudoers.d/colima
. Removing this file fixes thing.DNS is still using the user mode network which I have found to be unreliable with some DNS-heavy loads, even when all other traffic is routing via lima0. I’m using the following
~/.lima/_config/override.yaml
to use lima0 for DNS:With a couple more optional tweaks I can also get direct IP access to containers from the host:
The following in
/etc/docker/daemon.json
(along withsudo rc-service docker restart
) ensures Docker Compose networks use 172.17.0.0/16 too, avoiding having to add additional host routes for these Docker networks:@abiosoft I have noticed that after running for a while
eth0
got added back as a default route, I assume due to some network or power event. I am thinking a solution is to disableeth0
being considered for a default route. Seems the right way to do this in alpine is:Currently testing this.
With this workaround, I was able to get the desired result with this test https://github.com/abiosoft/colima/issues/140#issuecomment-1025634300.
I will keep an eye on the upstream issue. And in the meantime I will look at implementing this workaround in Colima.
I dug deeper into this issue I have been able to work around it within lima using PTP based networking as reported in lima-vm/lima#724. It would be nice to able to make this all work seamlessly without manually managing the colima template or the
vde_vmnet
process.One half solution is to add in
~/.lima/_config/override.yaml
the following:To inject the PTP network into the colima image without changing the template, but it will require manually starting the
vde_vmnet
process and deleting the default route going through the SLIRP network.I think it is more related to this https://github.com/lima-vm/lima/issues/561. It is specific to macOS and not reproducible on Linux, makes me think it is something to do with macOS networking.