colima: Colima hangs a short while after starting

Description

Shortly after starting a colima instance it seem to hang, and I’m unable to ssh into the session

% colima -v start -t vz -c 8 -m 8 --mount /Volumes/Void:w --mount-type virtiofs --mount /Volumes/Stash:w --mount-type virtiofs --mount /Volumes/Vacuum:w --mount-type virtiofs

% colima list
PROFILE    STATUS     ARCH       CPUS    MEMORY    DISK     RUNTIME    ADDRESS
default    Running    aarch64    8       8GiB      60GiB    docker     

# Roughly 5~10 minutes later
% colima ssh
FATA[0006] exit status 255

From ~/.lima/colima/ha.stderr.log I see:


{"level":"error","msg":"write unixgram -\u003e: write: no buffer space available","time":"2022-12-30T11:24:42-08:00"}
{"level":"error","msg":"cannot receive packets from , disconnecting: cannot read size from socket: read unixgram -\u003e: use of closed network connection","time":"2022-12-30T11:24:42-08:00"}{"level":"error","msg":"virtual network error: \"cannot read size from socket: read unixgram -\u003e: use of closed network connection\"","time":"2022-12-30T11:24:42-08:00"}
  • [ ]

Version

Colima Version:

colima version HEAD-88390f5
git commit: 88390f54bceb72e248044aa3b452b64c676d99d1

Lima Version: limactl version 0.14.2 Qemu Version: qemu-img version 7.2.0 macOS Version: 13.1 22C65

Operating System

  • macOS Intel <= 12 (Monterrey)
  • macOS Intel >= 13 (Ventura)
  • macOS M1 <= 12 (Monterrey)
  • macOS M1 >= 13 (Ventura)
  • Linux

Output of colima status

% colima status
FATA[0003] error retrieving current runtime: empty value

vm-type: vz mount type: virtiofs

Reproduction Steps

  1. Start Colima
  2. 3 Docker containers start
  3. Wait about 5-10 minutes

Expected behaviour

No response

Additional context

No response

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 5
  • Comments: 36 (6 by maintainers)

Most upvoted comments

@mdenna-synaptics Do you mind sharing your colima start command?

delete current colima session and settings

colima delete

configure colima to use virtiofs

colima start --vm-type vz --mount-type virtiofs

Note: This comment is preliminary. I hope to narrow things down and provide a set up where I can reproduce the issue reliably. This looks like gremlin territory, so no promises.

Not sure if this is related to the thread above, but I’m getting some hangs similar to what’s described in this thread.

The difference though is that I’m not using VZ. I’m using the default setup, with QEMU. It’s an Intel Mac, running Ventura 13.4.

This worked fine until today, when I added an extra container to my setup.

The hangs appear to happen at the end of a “composer install” or an “npm install”. The first time, the hang cured itself after a few minutes and the containers were accessible again. The second time, in the same session, it didn’t recover.

  • Control+C does not do anything.
  • colima status hangs.
  • colima version says it’s version 0.5.5, git commit 6251dc2c2c5d8197c356f0e402ad028945f0e830, then hangs
  • all docker commands hang

I did a colima stop -f, but I wasn’t kicked out of a container where I had a shell running - which was frozen. I had to kill -9 the docker-compose process that was running that shell.

Restarting colima seems to get things working again, so I don’t know how long it will take to reproduce the issue again. Colima had been running for about a week without a restart, but through computer sleeps, if that’s relevant.

Some filesystem synchronisation issues seem to be at play. Npm complains about permissions on a file in the cache. This cache is on a Docker volume. Composer gave an error when I ran it just now on a local mount point. Where these two have in common is that they all operat on lots of small files very quickly. If this is related, I guess the filesystem could get stuck, bringing everything down with it.

Context: My setup is basically based on this repository: https://gitlab.com/nucleware/docker-dev . Please excuse the lack of documentation. I made that to ease my multi-project PHP development setup, and it still had many pitfalls and annoyances. My volumes are mounted using the “local” driver, even though that repo defaults to nfs on a Mac.

Update 1: Restarting colima didn’t actually make everything work again. I could run a shell in my containers, but I couldn’t connect to my traefik container with my browser. I had to reboot my Mac to be able to connect. The filesystem problems are still there.

Same issue for me, I start my build in a container and after a couple of minutes Colima hangs. Ctrl-c doesn’t work and I’m not able to do colima ssh, nor colima stop

I have Intel MacBook Pro with macOS 13.6 , using qemu with sshfs. Note that:

FYI I switched to vz with virtiofs, since then I didn’t see the issue anymore and the build is faster.

Experiencing a similar hang, on Intel Mac.

All I’m doing is starting an ubuntu:22.04 image, installing some build dependencies, and trying to build binutils.

i.e.

apt update
apt install bison build-essential flex git libgmp-dev libmpfr-dev texinfo
git clone git://sourceware.org/git/binutils-gdb.git
cd binutils-gdb
CC=gcc ./configure
make

Not sure it’ll help, but it starts hanging at this point:

rm -f bfd-tmp.h
cp bfd-in3.h bfd-tmp.h
/bin/bash ./../move-if-change bfd-tmp.h bfd.h
rm -f bfd-tmp.h
touch stmp-bfd-h
  CC       archures.lo
  CC       targets.lo
  CC       dwarf2.lo
rm -f tofiles
f=""; \
for i in elf64-x86-64.lo elfxx-x86.lo elf-ifunc.lo elf-vxworks.lo elf64.lo elf.lo elflink.lo elf-attrs.lo elf-strtab.lo elf-eh-frame.lo elf-sframe.lo dwarf1.lo dwarf2.lo elf32-i386.lo elf32.lo pei-i386.lo peigen.lo cofflink.lo coffgen.lo pe-x86_64.lo pex64igen.lo pei-x86_64.lo elf64-gen.lo elf32-gen.lo plugin.lo cpu-i386.lo cpu-iamcu.lo  archive64.lo ; do \
  case " $f " in \
    *" $i "*) ;; \
    *) f="$f $i" ;; \
  esac ; \
done ; \
echo $f > tofiles
/bin/bash ./../move-if-change tofiles ofiles
touch stamp-ofiles
  CCLD     libbfd.la

EDIT: Actually, this may be quite important to the hang, but the build happens in a mounted directory. So the clone happens on the native file system.

The network stack for vz was updated in https://github.com/lima-vm/lima/pull/1383 (targeted for v0.16).

If interested in trying out do try with latest lima master and with this new network stack

Thanks for reporting this. I have experienced same with the VZ vm type and still troubleshooting.

If you do not need the faster filesystem access, QEMU is more stable at the moment.