vHive: Deployment of Functions in vHive failing

Description

I am trying to set up vHive on a single node cluster, and get it working by deploying and then invoking the functions, as described in the guide here. I am able to follow through the steps manually, and all the kubernetes pods are running as desired. However, when deploying functions using this link, I ran into some errors.

System Configuration

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0-255
Thread(s) per core:              2
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           1
Model name:                      AMD Eng Sample: 100-000000314-02_30/16_N
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1600.000
CPU max MHz:                     3000.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        3193.90
Virtualization:                  AMD-V
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        512 MiB
NUMA node0 CPU(s):               0-63,128-191
NUMA node1 CPU(s):               64-127,192-255
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_
                                 opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 f
                                 ma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
                                  misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l
                                 3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall erms xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm
                                 _local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid de
                                 codeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov suc
                                 cor smca fsrm

cat /etc/os-release output:

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Logs

vHive logs:

time="2022-05-19T12:15:03.479476427Z" level=error msg="coordinator failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to start the VM: [PUT /actions][400] createSyncActionBadRequest  &{FaultMessage:Internal error while starting microVM: VcpuConfigure(CpuId(InvalidVendor))}" image="ghcr.io/ease-lab/helloworld:var_workload" vmID=1
time="2022-05-19T12:15:03.479560085Z" level=error msg="failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to start the VM: [PUT /actions][400] createSyncActionBadRequest  &{FaultMessage:Internal error while starting microVM: VcpuConfigure(CpuId(InvalidVendor))}"
time="2022-05-19T12:15:03.482077842Z" level=error msg="VM config for pod d021d0b8ad35ac3cc8d9a0f8202e91dbc2c09081413cf2352a27717df00ed033 does not exist"
time="2022-05-19T12:15:03.482101657Z" level=error error="VM config for pod does not exist"

(I get the same issue as #476 initially. I then used the solution proposed on the ticket. Above logs are post application of the solution.)

Notes There is a similar issue mentioned here. This seems to be a firecracker-containerd issue for non-Intel vendors, which they seem to have fixed later (as per the issue). I am not sure whether the firecracker-containerd binary used in vHive is the latest one. When I clone the latest firecracker-containerd repo, install it, and replace the /vhive/bin/firecracker-containerd binary with the one I built, the vHive error log gets reduced to:

time="2022-05-19T06:31:18.406741266Z" level=error msg="failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to build VM configuration: no such file or directory"
time="2022-05-19T06:31:18.409782917Z" level=error msg="VM config for pod 84c5ce4eb538a061c3f75497a2b9f8688dc4cbfa351478a81691b05e4e59ff43 does not exist"
time="2022-05-19T06:31:18.409806492Z" level=error error="VM config for pod does not exist"
time="2022-05-19T06:31:36.204002382Z" level=warning msg="Failed to Fetch k8s dns clusterIP exit status 1\nThe connection to the server localhost:8080 was refused - did you specify the right host or port?\n\n"
time="2022-05-19T06:31:36.204047106Z" level=warning msg="Using google dns 8.8.8.8\n"
time="2022-05-19T06:31:36.350628233Z" level=error msg="coordinator failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to build VM configuration: no such file or directory" image="vhiveease/rnn_serving:var_workload" vmID=263

I have also gone through #525 and have access to /dev/kvm. Also, I am running on a bare-metal x86_64 amd server running Ubuntu 20.04.

Expected Behavior Functions should be deployed normally.

Steps to reproduce Simply follow the start-up guide provided to set up an one-node cluster & then run the deployer.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

@aditya2803 glad to hear! we always welcome improvements from the community ๐Ÿ‘ please close the Issue if itโ€™s resolved