kubevirt: Cannot Connect to VM over NodePort service after VM restart

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind enhancement

What happened: Create a VM and a corresponding ssh NodePort Service. Then Restart the VM by deleting the pod. What you expected to happen: After VM starts back up I expected to reconnect to the VM How to reproduce it (as minimally and precisely as possible): Using an ubuntu image, but any image should do. Just create a vm listed on a NodePort then once the vm has started up fully delete the pod and have it start back up. Give it a few minutes then try to ssh into the vm. You should get a connection timeout.

Anything else we need to know?: I am able to see that the vm endpoints are being updated in Kubernetes I am also able to see tcpdump traffic coming into the container, but not making it to the VM. Is it possible the container is not building that network back up correctly when a NodePort is listed as the service type? ubuntu-20.04-minimal-cloudimg-amd64.img

Environment:

  • KubeVirt version (use virtctl version): v0.45.0
  • Kubernetes version (use kubectl version): v1.20.6
  • VM or VMI specifications: Can’t copy paste this, but the VM works
  • Cloud provider or hardware configuration: Hardware, can’t list specifications
  • OS (e.g. from /etc/os-release): Ubuntu is loaded on the hardware
  • Kernel (e.g. uname -a): cannot list
  • Install tools: the vm is deployed by helm chart, but cannot list
  • Others:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 54 (31 by maintainers)

Commits related to this issue

Most upvoted comments

Okay so now that we have a work around, what would you all like @rmohr @phoracek @maiqueb to do about this github issue? Simply close it? Tie it to a different ticket? I want to make sure you all get the closure on this issue you are looking for before I close the github issue. I have my work around and it works fantastic.

I think it would be great if we get an out-of-the box improvement for that too, but great to hear that it works for you.

So to wrap this up, the issue was not in NodePort forwarding, nor in the network binding of VMs, but in the fact that the guest received a different MAC than it remembered from the previous run. Would that be correct @mjschmidt? And so the solution for this is one of:

  1. Set static MAC yourself on the VM spec
  2. Implement static MAC assignment in KubeVirt
  3. Deploy KubeMacPool alongside KubeVirt

@mjschmidt (without setting a static mac) did you try again a few minutes after you got no route to host? For me, after two minutes or so, I was able to connect even without setting a static mac, and I made sure the vmi did get a different mac after the recreation.

Yes the reason your’s was working is because you were not maintaining VM persistence and each VM was like deploying a new VM

Thanks @rmohr that’s an intersting direction to investigate. Maybe it takes time for the VM to be updated with the correct ETH device? because for me after the issue recreated, I tried a few times more, and eventually it worked and I was able to ssh to the vmi. as can be seen in the previous comment.

Most of the time it is related to mac address changes on restarts. Many init procedures take the mac address as the device identifier. If it changes, even if the eth name stays the same, they will not treat the eth device as the same again. Setting a mac address on the outside should resolve that, if that is the case. In many cases also a early-boot init script which deletes the mac from config files in the guest helps. Then the init systems normally fall back to the device name (and will then write the new mac into their configs, ugh … 😉 ).

something like this ^

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1alpha3
  name: michael-testme-vm
  labels:
    kubevirt.io/vm: michael-testme-vm
  spec:
    pvc:
      accessMode:
      - ReadWriteOnce
      resources:
        requests:
          storage: 30Gi
        volumeMode: Block
      source:
        http: https://cloud-images.ubuntu.com/minimal/releases/focal/release-20200423/ubuntu-20.04-minimal-cloud-amd64.img
      running: true
      template:
        metadata:
          labels:
            kubevirt.io/vm: michael-testme-vm
        spec:
          domain:
            cpu:
              cores: 2
              sockets: 1
              threads: 1
            devices:
              disk:
              - disk:
                  bus: virtio
                name: disk-0
              -  disk:
                  bus: virtio
                name: cloudinitdisk
              inputs:
              - bus: usb
                name: tablet
                type: tablet
              interfaces:
              - masquerade: {}
                model: virtio
                name: default
            machine:
              type: q35
            resources:
              requests:
                memory: 4Gi
          hostname: michael-testme-vm
          networks:
          - name: default
            pod: {}
          volumes:
          - dataVolume:
              name: michael-testme-vm
            name: disk-0
          - cloudInitNoCloud:
            userData: |
              password: testpass
              chpasswd:
                expire: false
              ssh_pwauth: true
            name: cloudinitdisk