longhorn: [TEST] Run e2e test cases on Photon OS to see if any failures there

What’s the test to develop? Please describe

Run e2e test cases https://vmware.github.io/photon/ to see if any failures there.

Describe the tasks for the test

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Update about building AMI for Photon OS Real Time kernel

Documenting the process that I went through so hope that we won’t do duplicated work or you might have some advice for me cc @yangchiu @chriscchien @mantissahz

1. Using the rebuild Photon OS AMI (failed)

2. Running a VM using ISO image of Photon RT OS and export the VM to AMI (failed)

  • So from the download page , it looks like that the only installation method which has real time kernel is via ISO image. Specifically, this ISO image photon-rt-5.0-dde71ec57.x86_64.iso at https://packages.vmware.com/photon/5.0/GA/iso/
  • So I decided to run a VM from that ISO image then export the VM to AMI.
  • First, I download the VMware Fusion pro application on a MAC with intel chip (note that I tried with VirtualBox too but it doesn’t work as it seems that VMware OS doesn’t play nice with Oracle product)
  • Using VMware Fusion, I create a VM from the Photon RT OS ISO image. Photon team officially support this as mention in their doc https://vmware.github.io/photon/docs-v5/installation-guide/run-photon-on-fusion/
  • Then stop the VM and export the VM as an OVA file using this doc https://docs.vmware.com/en/VMware-Fusion/13/com.vmware.fusion.using.doc/GUID-16E390B1-829D-4289-8442-270A474C106A.html
  • Finally, import the VM to create AMI using this AWS doc https://docs.aws.amazon.com/vm-import/latest/userguide/vmimport-image-import.html
  • Result: failed. I hit multiple errors like:
    {
        "ImportImageTasks": [
            {
                "Description": "A Photon RT 5.0 disk",
                "ImportTaskId": "import-ami-0b99fd43bf0302d00",
                "SnapshotDetails": [
                    {
                        "DeviceName": "/dev/sde",
                        "DiskImageSize": 574307328.0,
                        "Format": "VMDK",
                        "Status": "completed",
                        "UserBucket": {
                            "S3Bucket": "phan-ami",
                            "S3Key": "vms/VMware-Photon-OS-64-bit.ova"
                        }
                    }
                ],
                "Status": "deleted",
                "StatusMessage": "ClientError: Multiple different grub/menu.lst files found.",
                "Tags": []
            }
        ]
    }
    
    
    OR
    {
        "ImportImageTasks": [
            {
                "Description": "A Photon RT 5.0 disk",
                "ImportTaskId": "import-ami-02a60a61cdab26957",
                "SnapshotDetails": [
                    {
                        "DeviceName": "/dev/sde",
                        "DiskImageSize": 566789632.0,
                        "Format": "VMDK",
                        "Status": "completed",
                        "UserBucket": {
                            "S3Bucket": "phan-ami",
                            "S3Key": "vms/VMware-Photon-OS-64-bit.ova"
                        }
                    }
                ],
                "Status": "deleted",
                "StatusMessage": "ClientError: Unsupported GRUB configuration - Unable to determine kernel version",
                "Tags": []
            }
        ]
    }
    
  • After a day of investigation, I couldn’t figure out how to solve this error so kind of stuck here

3. Running a VM using ISO image of Photon RT OS on Equinix Metal (Trying)

Since the first 2 method failed, I am trying to use Equinix Metal.

3.1 Provisioning with Custom iPXE (failed)

  1. Looks like netboot.xyz only have Photon OS 4.0
  2. When we supply the script with Photon RT ISO using this instruction https://deploy.equinix.com/developers/docs/metal/operating-systems/custom-ipxe/#:~:text=If you want,entering the commands%3A, the installation stuck at:
    MEMDISK 6.03 20150819  Copyright 2001-2014 H. Peter Anvin et al                 
    e820: 000000005ff31000 00000000000cf000 1                                       
    e820: 0000000100000000 000000077f800000 1                                       
    El Torito BVD sanity check failed.                                              
    1588: 0xffff  15E801: 0x3c00 0x1457                                             
    e820: 0000000000082c00 0000000000004c00 2                                       
    e820: 0000000060000000 0000000008000000 2                                       
    e820: 00000000e0000000 0000000010000000 2     
    

3.2 Install any Operating System from an ISO image source (failed)

I follow this instruction to install any ISO on equinix metal https://deploy.equinix.com/developers/guides/install-any-os-via-iso/. However, I always get a black screen at this step https://deploy.equinix.com/developers/guides/install-any-os-via-iso/#:~:text=When prompted%2C enter,with the desktop. Not sure what is the root cause

3.3 Investigate how Harvester is able to do iPXE boot (trying)

Looks like Harvester have some doc about this https://github.com/harvester/ipxe-examples/blob/main/equinix/README.md. Investigating this one

4. Ask Photon team to help to push an AMI for Photon RT OS

If all the method above failed, maybe we have to ask Photon team to help to push an AMI for Photon RT OS

Update for the regression test in my local environment on Photon5 RT OS:

  1. test_support_bundle.py::test_support_bundle_should_not_timeout: Need 11G Memory at least for worker nodes.
  2. test_settings.py::test_instance_manager_cpu_reservation: Need fixed 4 CPUs worker nodes. Other test cases are almost mentioned as https://github.com/longhorn/longhorn/issues/8113#issuecomment-1985371533

I have built a k3s cluster on the Photon OS in my local environment:

  • Longhorn version: master-head (2024-03-14 42c0757)
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.23.2+k3s1
    • Number of control plane nodes in the cluster: 1
    • Number of worker nodes in the cluster: 3
  • Node config
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM
  • Number of Longhorn volumes in the cluster:

And now it is running the full regression test and waiting for the result:

root@photon-01 [ ~ ]# kl get no
NAME          STATUS   ROLES                       AGE     VERSION
photon-01     Ready    control-plane,etcd,master   4h12m   v1.23.2+k3s1
photon-wk01   Ready    <none>                      4h12m   v1.23.2+k3s1
photon-wk02   Ready    <none>                      4h12m   v1.23.2+k3s1
photon-wk03   Ready    <none>                      4h12m   v1.23.2+k3s1
root@photon-01 [ ~ ]# cat /etc/os-release
NAME="VMware Photon OS"
VERSION="5.0"
ID=photon
VERSION_ID=5.0
PRETTY_NAME="VMware Photon OS/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://vmware.github.io/photon/"
BUG_REPORT_URL="https://github.com/vmware/photon/issues"
root@photon-01 [ ~ ]# uname -ar
Linux photon-01 6.1.10-10.ph5-rt #1-photon SMP PREEMPT_RT Mon Apr 24 23:57:41 UTC 2023 x86_64 GNU/Linux
root@photon-01 [ ~ ]# k logs longhorn-test -c longhorn-test -f
============================= test session starts ==============================
platform linux -- Python 3.11.8, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3.11
cachedir: .pytest_cache
rootdir: /integration, configfile: pytest.ini
plugins: repeat-0.9.1, order-1.0.1
collecting ... collected 458 items

test_backing_image.py::test_backing_image_basic_operation PASSED         [  0%]
test_backing_image.py::test_backing_image_content PASSED                 [  0%]
test_backing_image.py::test_volume_basic_with_backing_image PASSED       [  0%]
test_backing_image.py::test_volume_iscsi_basic_with_backing_image PASSED [  0%]
test_backing_image.py::test_snapshot_with_backing_image PASSED           [  1%]
test_backing_image.py::test_snapshot_prune_with_backing_image PASSED     [  1%]
test_backing_image.py::test_snapshot_prune_and_coalesce_simultaneously_with_backing_image PASSED [  1%]
test_backing_image.py::test_backup_with_backing_image[s3] PASSED         [  1%]
test_backing_image.py::test_backup_with_backing_image[nfs] PASSED        [  1%]
test_backing_image.py::test_backup_labels_with_backing_image[s3] PASSED  [  2%]
test_backing_image.py::test_backup_labels_with_backing_image[nfs] PASSED [  2%]
test_backing_image.py::test_ha_simple_recovery_with_backing_image PASSED [  2%]
test_backing_image.py::test_ha_salvage_with_backing_image PASSED         [  2%]
test_backing_image.py::test_ha_backup_deletion_recovery[s3] PASSED       [  3%]
test_backing_image.py::test_ha_backup_deletion_recovery[nfs] PASSED      [  3%]

From the GA page, if I understand correctly only ISO x86_64 Real-Time flavour support Realtime kernel parameter enabled (I tried use tdnf install linux-rt to update kernel in ec2 instance, but instance will not ready after reboot).

Currently I have some issue while setting up cluster with ISO file. I will continue work on this and may need more time, thank you.

Test results based on Photon OS 5.0 GA:

amd64: 8 failures arm64: 4 failures

Failed test cases are:

In conclusion, there is no outstanding issue on Photon OS 5.0 Longhorn regression tests.

cc @innobead