marblerun: Edgeless Runtime Container deployment fails

Issue description

Deploying the edgeless container runtime. I was trying to deploy my code in a container and it was continuously restarting. So I attempted to deploy ghcr.io/edgelesssys/edgelessrt-deploy:latest. It exhibits the same behavior.

To reproduce

Steps to reproduce the behavior:

  1. Pod Yaml:
apiVersion: v1
kind: Pod
metadata:
  name: static-web
spec:
  containers:
    - name: web
      image: ghcr.io/edgelesssys/edgelessrt-deploy:latest
  1. kubectl apply -f pod-test.yaml
  2. kubectl describe pods static-web
Name:         static-web
Namespace:    default
Priority:     0
Node:         <node-name>
Start Time:   Fri, 06 Aug 2021 21:10:32 +0000
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           <ip>
IPs:
  IP:  <ip>
Containers:
  web:
    Container ID:   containerd://f6d96345df7803b1725ad40b19dd5aa66b7628c5fe37bb247ad4557c28c428da
    Image:          ghcr.io/edgelesssys/edgelessrt-deploy:latest
    Image ID:       ghcr.io/edgelesssys/edgelessrt-deploy@sha256:d622febf6c92c7a0062fea1dee20f5d0a35a386167888a39936129df87466cf3
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 06 Aug 2021 21:16:19 +0000
      Finished:     Fri, 06 Aug 2021 21:16:19 +0000
    Ready:          False
    Restart Count:  6
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tkhr8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-tkhr8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tkhr8
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m10s                  default-scheduler  Successfully assigned default/static-web to <node-name>
  Normal   Pulled     6m9s                   kubelet            Successfully pulled image "ghcr.io/edgelesssys/edgelessrt-deploy:latest" in 520.117841ms
  Normal   Pulled     6m8s                   kubelet            Successfully pulled image "ghcr.io/edgelesssys/edgelessrt-deploy:latest" in 204.802328ms
  Normal   Pulled     5m53s                  kubelet            Successfully pulled image "ghcr.io/edgelesssys/edgelessrt-deploy:latest" in 236.375905ms
  Normal   Created    5m24s (x4 over 6m9s)   kubelet            Created container web
  Normal   Started    5m24s (x4 over 6m9s)   kubelet            Started container web
  Normal   Pulled     5m24s                  kubelet            Successfully pulled image "ghcr.io/edgelesssys/edgelessrt-deploy:latest" in 206.955528ms
  Normal   Pulling    4m34s (x5 over 6m10s)  kubelet            Pulling image "ghcr.io/edgelesssys/edgelessrt-deploy:latest"
  Warning  BackOff    58s (x25 over 6m7s)    kubelet            Back-off restarting failed container

Expected behavior

Environment:

  • Marblerun version: latest
  • Edgeless RT version: container
  • Go version: container
  • Kubernetes version: 1.21.3

Additional info / screenshots

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (10 by maintainers)

Most upvoted comments

I am using the Azure ones instead of Intel: https://github.com/Azure/aks-engine/blob/master/docs/topics/sgx.md#deploying-the-sgx-device-plugin.

In that case your pods should request epc using azures plugin:

apiVersion: v1
kind: Pod
metadata:
  name: <pod_name>
spec:
  containers:
    - name: <container_name>
      image: <your_image>
      resources:
        limits:
          kubernetes.azure.com/sgx_epc_mem_in_MiB: 10

As for Graphene not working, have you tried running your code outside of the docker environment? I.e installed Graphene on your machine directly and managed to get any of their examples, or your code, running? If that is not the case, something is wrong with your installation or setup, and I would suggested raising and issue over at Graphene directly, as they are much more experienced with the project and can probably provide much better help.

Well… Graphene cannot find python.manifest.sgx in the current working directory of the Docker environment.

It’s properly better to use an absolute path, or to specify WORKDIR before defining ENTRYPOINT.

Then if you do this, make sure WORKDIR actually contains python.manifest.sgx, which should be generated from python.manifest.template after calling graphene-sgx-sign onto it (which the Makefile you linked actually does).

Honestly, these are pretty basic mistakes. Graphene just cannot find the manifest file derived from the name of your entry point.

Just to give you an idea on where Graphene searches for the manifest file:

$ mkdir emptydir && cd emptydir
$ graphene-sgx python
Invalid application path specified (python.manifest.sgx does not exist).
The path should point to application configuration files, so that they can be
found after appending corresponding extensions.

$ touch python.manifest.sgx
$ graphene-sgx python
error: Enclave size not a power of two (an SGX-imposed requirement)
error: Parsing manifest failed
error: load_enclave() failed with error -22

I really recommend you to go through this step-by-step on a local or virtual machine instead before throwing it into a Dockerfile. This helps you to evaluate if your application actually works with Graphene, how the folder layout needs to look like, what to put into the manifest, what to use as ENTRYPOINT when eventually defining the Dockerfile. etc. Right now you are tweaking too many things at once, without even getting anything to launch. It might be painful to go this way, so please do it step-by-step as I listed above.

If you actually get something to launch with Graphene, whether it’s failing or not, that would be a step forward to help you understanding what you are doing and get your project running. So please, don’t tweak too many things at once 😃