podman: podman machine on macOS becomes unresponsive after some time

Issue Description

My podman machine is not responding after some time.

CLI is not responding but podman machine ls says it’s running

$ podman machine ls 

NAME                     VM TYPE     CREATED     LAST UP            CPUS        MEMORY      DISK SIZE
podman-machine-default*  qemu        2 days ago  Currently running  4           5.859GiB    100GiB

podman version 4.7.2

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a podman machine and start it
  2. let it running for a while
  3. after a some hours, machine stop to answer

Describe the results you received

Connection refused or connection hanging

podman ps is blocking

podman machine ssh as well

Describe the results you expected

it should work

podman info output

macOS Sonoma

inspect of the machine:

podman machine inspect


[
     {
          "ConfigPath": {
               "Path": "/Users/benoitf/.config/containers/podman/machine/qemu/podman-machine-default.json"
          },
          "ConnectionInfo": {
               "PodmanSocket": {
                    "Path": "/Users/benoitf/.local/share/containers/podman/machine/qemu/podman.sock"
               },
               "PodmanPipe": null
          },
          "Created": "2023-11-06T18:24:20.266785+01:00",
          "Image": {
               "IgnitionFilePath": {
                    "Path": "/Users/benoitf/.config/containers/podman/machine/qemu/podman-machine-default.ign"
               },
               "ImageStream": "testing",
               "ImagePath": {
                    "Path": "/Users/benoitf/.local/share/containers/podman/machine/qemu/podman-machine-default_fedora-coreos-38.20231027.2.0-qemu.aarch64.qcow2"
               }
          },
          "LastUp": "2023-11-07T10:49:04.99063+01:00",
          "Name": "podman-machine-default",
          "Resources": {
               "CPUs": 4,
               "DiskSize": 100,
               "Memory": 6000
          },
          "SSHConfig": {
               "IdentityPath": "/Users/benoitf/.ssh/podman-machine-default",
               "Port": 55861,
               "RemoteUsername": "core"
          },
          "State": "running",
          "UserModeNetworking": true,
          "Rootful": false
     }
]

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

I started podman with DEBUG output so I have a qemu window

image

we can see virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0 messages

I need to do

ifconfig enp0s1 down and then ifconfig enp0s1 up and then network is restored in the machine

podman info output when it’s working back:

host:
  arch: arm64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 99.66
    systemPercent: 0.15
    userPercent: 0.18
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "38"
  eventLogger: journald
  freeLocks: 2038
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 501
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.5.8-200.fc38.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 2644602880
  memTotal: 6041264128
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc38.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc38.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.10-1.fc38.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.10
      commit: c053c83c57551bca13ead8600237341818975974
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231004.gf851084-1.fc38.aarch64
    version: |
      pasta 0^20231004.gf851084-1.fc38.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/501/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc38.aarch64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 24h 11m 57.00s (Approximately 1.00 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 0
    stopped: 7
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 7597252608
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/user/501/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.0
  Built: 1695839065
  BuiltTime: Wed Sep 27 20:24:25 2023
  GitCommit: ""
  GoVersion: go1.20.8
  Os: linux
  OsArch: linux/arm64
  Version: 4.7.0

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Reactions: 5
  • Comments: 65 (21 by maintainers)

Most upvoted comments

It’s only the network interface served by gvproxy, i.e. -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan, is dead.

This is consistent with what Florent pointed out in the issue description, ifdown/ifup on the interface served by gvproxy fixes networking. Reloading the virtio-net module has the same effect.

Results this morning are promising! VM is perfectly fine and operational. Are there any expected differences between vfkit and qemu?

Like disk or network performance?

reopening as I’m hitting the issue again with everything being up-to-date

When running podman with --debug, I also noticed that the QEMU VM itself seems to be doing just fine. If I define an additional network interface via QEMU options, I can still reach the VM without any troubles. It’s only the network interface served by gvproxy, i.e. -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan, is dead.

For the record, I’m running macOS Sonoma 14.1.1 (23B81) @ Apple M1 Pro.

podman version 4.7.2

Yes there is an issue with relabeling read/only files from the MAC, but everything else works.

Let’s go !

╰─ export CONTAINERS_MACHINE_PROVIDER=applehv

╰─ brew tap cfergeau/crc && brew install vfkit
...
🍺  /opt/homebrew/Cellar/vfkit/0.5.0: 5 files, 15.7MB, built in 1 minute 14 seconds
...


╰─ podman machine init --cpus 8 --memory 12000 --rootful --user-mode-networking --now
Downloading VM image: fedora-coreos-39.20231204.2.1-applehv.aarch64.raw.gz: done
Extracting compressed file: podman-machine-default_fedora-coreos-39.20231204.2.1-applehv.aarch64.raw: done
Machine init complete
Starting machine "podman-machine-default"
...

Machine "podman-machine-default" started successfully

we need to find a pattern so we can identify what is going on … users reported using the official release binaries and NOT brew resulted in this problem going away. Do others find this to be true ?

@CA-Demetriade i would suggest to run podman machine --log-level=DEBUG (so you have a QEMU prompt)

connect with podman machine ssh change the password and then login in the QEMU console

if it freezes you could do some inspection from the QEMU terminal (if networking is not working)

@cfergeau i would agree about the gvproxy as plausible place to start digging … one thing i am having trouble reconciling though is why removing the mod and re-installing it would “fix” gvproxy?

@benoitf is it possible to say that the machine never hibernated before exhibiting this behavior ?