minishift: Webconsole Crashloop and Permission Issue

General information

  • Minishift version: minishift v1.24.0+8a904d0 /minishift v1.23
  • OS: macOS
  • Hypervisor: xhyve

Steps to reproduce

  1. minishift profile set demo ( with memory 8GB/ vcpus 4)
  2. minishift start
  3. minishift console

Expected

The minishift console should be opened in browser

Actual

The browser opens the OpenShift console url but with HTTP 404. Doing oc project openshift-web-console , oc get pods

NAME                          READY     STATUS             RESTARTS   AGE
webconsole-6df7dd6b7b-6msxj   0/1       CrashLoopBackOff   8          18m

Logs

oc logs <webconsole pod>

W0917 15:45:19.252521       1 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, web console start will continue.
W0917 15:45:19.252616       1 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, web console start will continue.
Error: unable to load server certificate: open /var/serving-cert/tls.crt: permission denied
Usage:
  origin-web-console [flags]
Flags:
      --alsologtostderr                                log to standard error as well as files
      --audit-log-format string                        Format of saved audits. "legacy" indicates 1-line text format for each event. "json" indicates structured json format. Requires the 'AdvancedAuditing' feature gate. Known formats are legacy,json. (default "json")
      --audit-log-maxage int                           The maximum number of days to retain old audit log files based on the timestamp encoded in their filename.
      --audit-log-maxbackup int                        The maximum number of old audit log files to retain.
      --audit-log-maxsize int                          The maximum size in megabytes of the audit log file before it gets rotated.
      --audit-log-path string                          If set, all requests coming to the apiserver will be logged to this file.  '-' means standard out.
      --audit-policy-file string                       Path to the file that defines the audit policy configuration. Requires the 'AdvancedAuditing' feature gate. With AdvancedAuditing, a profile is required to enable auditing.
      --audit-webhook-batch-buffer-size int            The size of the buffer to store events before batching and sending to the webhook. Only used in batch mode. (default 10000)
      --audit-webhook-batch-initial-backoff duration   The amount of time to wait before retrying the first failed requests. Only used in batch mode. (default 10s)
      --audit-webhook-batch-max-size int               The maximum size of a batch sent to the webhook. Only used in batch mode. (default 400)
      --audit-webhook-batch-max-wait duration          The amount of time to wait before force sending the batch that hadn't reached the max size. Only used in batch mode. (default 30s)
      --audit-webhook-batch-throttle-burst int         Maximum number of requests sent at the same moment if ThrottleQPS was not utilized before. Only used in batch mode. (default 15)
      --audit-webhook-batch-throttle-qps float32       Maximum average number of requests per second. Only used in batch mode. (default 10)
      --audit-webhook-config-file string               Path to a kubeconfig formatted file that defines the audit webhook configuration. Requires the 'AdvancedAuditing' feature gate.
      --audit-webhook-mode string                      Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the webhook to buffer and send events asynchronously. Known modes are batch,blocking. (default "batch")
      --config string                                  filename containing the WebConsoleConfig
      --contention-profiling                           Enable lock contention profiling, if profiling is enabled
      --enable-swagger-ui                              Enables swagger ui on the apiserver at /swagger-ui
      --log-flush-frequency duration                   Maximum number of seconds between log flushes (default 5s)
      --log_backtrace_at traceLocation                 when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                                 If non-empty, write log files in this directory
      --logtostderr                                    log to standard error instead of files (default true)
      --profiling                                      Enable profiling via web interface host:port/debug/pprof/ (default true)
      --stderrthreshold severity                       logs at or above this threshold go to stderr (default 2)
  -v, --v Level                                        log level for V logs
      --vmodule moduleSpec                             comma-separated list of pattern=N settings for file-filtered logging
F0917 15:45:19.253264       1 console.go:35] unable to load server certificate: open /var/serving-cert/tls.crt: permission denied

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 96 (56 by maintainers)

Commits related to this issue

Most upvoted comments

I spent some time with @jcrossley3 and I believe we have a better understanding of what’s happening now.

TLDR: there seems to be a race that can result in the webconsole pod sometimes being matched against the anyuid scc when the anyuid addon is enabled.

As evidence, I submit the following:

Crashing pod:

apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      openshift.io/scc: anyuid
      operator.openshift.io/force: 37cf0301-f960-11e8-b122-0242ac110007
    creationTimestamp: 2018-12-06T14:07:18Z
    generateName: webconsole-5994fdd4b5-
    labels:
      app: openshift-web-console
      pod-template-hash: "1550988061"
      webconsole: "true"
    name: webconsole-5994fdd4b5-xj747
    namespace: openshift-web-console
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: webconsole-5994fdd4b5
      uid: 3dffff6a-f960-11e8-9173-5254007729e4
    resourceVersion: "2496"
    selfLink: /api/v1/namespaces/openshift-web-console/pods/webconsole-5994fdd4b5-xj747
    uid: 3e09c3a8-f960-11e8-9173-5254007729e4
  spec:
    containers:
    - args:
      - -v=0
      command:
      - /usr/bin/origin-web-console
      - --audit-log-path=-
      - --config=/var/webconsole-config/webconsole-config.yaml
      image: openshift/origin-web-console:v3.11.0
      imagePullPolicy: IfNotPresent
      livenessProbe:
        exec:
          command:
          - /bin/sh
          - -i
          - -c
          - |-
            if [[ ! -f /tmp/webconsole-config.hash ]]; then \
              md5sum /var/webconsole-config/webconsole-config.yaml > /tmp/webconsole-config.hash; \
            elif [[ $(md5sum /var/webconsole-config/webconsole-config.yaml) != $(cat /tmp/webconsole-config.hash) ]]; then \
              exit 1; \
            fi && curl -k -f https://0.0.0.0:8443/console/
        failureThreshold: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      name: webconsole
      ports:
      - containerPort: 8443
        protocol: TCP
      readinessProbe:
        failureThreshold: 3
        httpGet:
          path: /healthz
          port: 8443
          scheme: HTTPS
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
      securityContext:
        capabilities:
          drop:
          - MKNOD
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/serving-cert
        name: serving-cert
      - mountPath: /var/webconsole-config
        name: webconsole-config
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: webconsole-token-jfrrs
        readOnly: true
    dnsPolicy: ClusterFirst
    imagePullSecrets:
    - name: webconsole-dockercfg-s2vjb
    nodeName: localhost
    priority: 0
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext:
      seLinuxOptions:
        level: s0:c9,c4
    serviceAccount: webconsole
    serviceAccountName: webconsole
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    volumes:
    - name: serving-cert
      secret:
        defaultMode: 400
        secretName: webconsole-serving-cert
    - configMap:
        defaultMode: 440
        name: webconsole-config
      name: webconsole-config
    - name: webconsole-token-jfrrs
      secret:
        defaultMode: 420
        secretName: webconsole-token-jfrrs
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2018-12-06T14:07:18Z
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: 2018-12-06T14:07:18Z
      message: 'containers with unready status: [webconsole]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: null
      message: 'containers with unready status: [webconsole]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: 2018-12-06T14:07:18Z
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: docker://468ad0475256ad42fc556eba53d2cc171751d7e08ad037135bd2a90349bf2d1f
      image: docker.io/openshift/origin-web-console:v3.11.0
      imageID: docker-pullable://docker.io/openshift/origin-web-console@sha256:20c14ce54de73b9203f03b7ee19f81769b9f404bb7dd81ed0ee4a1b8baecf6d1
      lastState:
        terminated:
          containerID: docker://468ad0475256ad42fc556eba53d2cc171751d7e08ad037135bd2a90349bf2d1f
          exitCode: 255
          finishedAt: 2018-12-06T14:07:43Z
          reason: Error
          startedAt: 2018-12-06T14:07:43Z
      name: webconsole
      ready: false
      restartCount: 2
      state:
        waiting:
          message: Back-off 20s restarting failed container=webconsole pod=webconsole-5994fdd4b5-xj747_openshift-web-console(3e09c3a8-f960-11e8-9173-5254007729e4)
          reason: CrashLoopBackOff
    hostIP: 192.168.122.227
    phase: Running
    podIP: 172.17.0.8
    qosClass: Burstable
    startTime: 2018-12-06T14:07:18Z
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Working pod:

apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      openshift.io/scc: restricted
      operator.openshift.io/force: 7ce88558-f95b-11e8-b645-0242ac110007
    creationTimestamp: 2018-12-06T13:33:18Z
    generateName: webconsole-68868bdd48-
    labels:
      app: openshift-web-console
      pod-template-hash: "2442468804"
      webconsole: "true"
    name: webconsole-68868bdd48-5sxhw
    namespace: openshift-web-console
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: webconsole-68868bdd48
      uid: 7dedc942-f95b-11e8-9d9b-525400bf50da
    resourceVersion: "2357"
    selfLink: /api/v1/namespaces/openshift-web-console/pods/webconsole-68868bdd48-5sxhw
    uid: 7df3f4b5-f95b-11e8-9d9b-525400bf50da
  spec:
    containers:
    - args:
      - -v=0
      command:
      - /usr/bin/origin-web-console
      - --audit-log-path=-
      - --config=/var/webconsole-config/webconsole-config.yaml
      image: openshift/origin-web-console:v3.11.0
      imagePullPolicy: IfNotPresent
      livenessProbe:
        exec:
          command:
          - /bin/sh
          - -i
          - -c
          - |-
            if [[ ! -f /tmp/webconsole-config.hash ]]; then \
              md5sum /var/webconsole-config/webconsole-config.yaml > /tmp/webconsole-config.hash; \
            elif [[ $(md5sum /var/webconsole-config/webconsole-config.yaml) != $(cat /tmp/webconsole-config.hash) ]]; then \
              exit 1; \
            fi && curl -k -f https://0.0.0.0:8443/console/
        failureThreshold: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      name: webconsole
      ports:
      - containerPort: 8443
        protocol: TCP
      readinessProbe:
        failureThreshold: 3
        httpGet:
          path: /healthz
          port: 8443
          scheme: HTTPS
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
      securityContext:
        capabilities:
          drop:
          - KILL
          - MKNOD
          - SETGID
          - SETUID
        runAsUser: 1000120000
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/serving-cert
        name: serving-cert
      - mountPath: /var/webconsole-config
        name: webconsole-config
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: webconsole-token-b8bf6
        readOnly: true
    dnsPolicy: ClusterFirst
    imagePullSecrets:
    - name: webconsole-dockercfg-tz2fh
    nodeName: localhost
    priority: 0
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext:
      fsGroup: 1000120000
      seLinuxOptions:
        level: s0:c11,c5
    serviceAccount: webconsole
    serviceAccountName: webconsole
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    volumes:
    - name: serving-cert
      secret:
        defaultMode: 400
        secretName: webconsole-serving-cert
    - configMap:
        defaultMode: 440
        name: webconsole-config
      name: webconsole-config
    - name: webconsole-token-b8bf6
      secret:
        defaultMode: 420
        secretName: webconsole-token-b8bf6
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2018-12-06T13:33:18Z
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: 2018-12-06T13:33:25Z
      status: "True"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: null
      status: "True"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: 2018-12-06T13:33:18Z
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: docker://a59d1966870be91a6faa37ef4b10830a00753cee08b084244c633e8e58255f4e
      image: docker.io/openshift/origin-web-console:v3.11.0
      imageID: docker-pullable://docker.io/openshift/origin-web-console@sha256:20c14ce54de73b9203f03b7ee19f81769b9f404bb7dd81ed0ee4a1b8baecf6d1
      lastState: {}
      name: webconsole
      ready: true
      restartCount: 0
      state:
        running:
          startedAt: 2018-12-06T13:33:19Z
    hostIP: 192.168.122.49
    phase: Running
    podIP: 172.17.0.8
    qosClass: Burstable
    startTime: 2018-12-06T13:33:18Z
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Differences between working and not working:

The working pod is matched against the restricted SCC, which has the following effects:

  • runAsUser is set to an allocated UID for the containers
  • selinuxOptions is set on the pod-level securityContext
  • fsGroup is set on pod-level securityContext

The non-working pod is matched against the anyuid SCC, which means that:

  • selinuxOptions is not set
  • runAsUser is not set
  • fsGroup is not set

The origin-web-console image sets USER to 1001. Something about the permissions required is incorrect when the pod is matched against anyuid SCC.

Note: in both of these scenarios, the fix for openshift/origin#21250 is not applied; ie, the defaultMode is still set on the pod’s volumes. This tells me that the oc used to stand up the webconsole does not have the fix.

The real question here is why the web console pod is sometimes correctly matched against restricted and sometimes incorrectly matched against anyuid. I characterize the match against anyuid as incorrect because the pod descriptor being created does not use any features that would warrant being matched against anyuid. That points to a race condition in the SCC admission controller.

Third time is the charm:

openshift-web-console           webconsole-67bbf79f57-pvdcj                               0/1       CrashLoopBackOff   2          34s

disabling anyuid

This is documented as recorded as a known issue. It is an addon that causes more harm than convience

On Sun, Mar 8, 2020 at 11:28 AM andresmmujica notifications@github.com wrote:

removing anyuid addon and recreating the webconsole pods, works too.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/minishift/minishift/issues/2809?email_source=notifications&email_token=AAAAOZTB2PD4KFT6HV55R73RGMGHVA5CNFSM4FVQ2LF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOELOLQ#issuecomment-596162350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAOZTN5KAMIX4K4XRAXC3RGMGHVANCNFSM4FVQ2LFQ .

Gerard Braad | http://gbraad.nl [ Doing Open Source Matters ]

This should be recorded as a Known Issue for the release: anyuid can cause Permission Denied issues on startup. If this happens, please disable the addon and apply after the cluster deployment succeeded when necessary.

@kameshsampath I suggest you to disable this addon for the time being, and only use apply to enable the addon after the cluster came up. but allow enough time for the webconsole to come up. We are looking into a fix, but I think for the time being this is going to be documented as a known issue.

/cc: @robin-owen @LalatenduMohanty @praveenkumar

just came back from lunch and will run it some additional times (made a copy of the failing vm). so, at the moment we both run at least the same versions… and both have anyuid enabled.

YAY! We’re not crazy! Finally! 😃

Hmmm… I also get now:

docker.io/openshift/origin-haproxy-router                v3.11.0             a9b61417aa4c        2 days ago          407 MB
<none>                                                   <none>              5b204a8da075        2 days ago          825 MB
docker.io/openshift/origin-docker-registry               v3.11.0             ed683ef24244        6 weeks ago         305 MB

this aren’t the image IDs I got before. interesting… maybe related to Docker hub and their caching… but 2 days already… ? hmmm… anyway, web-console is Running happily for 6m:

openshift-web-console           webconsole-cc9cf8675-2z9zd                                1/1       Running     0          6m

I do not have addons enabled, so I will run with… anyuid tends to change behaviour quite a bit.

If I compare the oc binary downloaded by minishift and the one I can download from origin releases (diff ~/.minishift/cache/oc/v3.11.0/linux/oc /usr/bin/oc), I see no difference.

And since the patch has been merged on November 1st and the release is from October, it would never work.

I wonder if compiling a oc binary from the OpenShift 3.11.x branch and replacing the one downloaded by minishift would finally solve this issue…

Hi folks, I’ve erased .minishift and did minishift delete && minishift start 8 times this morning before having the web-console running. It randomly works.

Is the patch included in origin tag v3.11.0?

I just blew away my ~/.minishift and my webconsole is still crashlooping with minishift 1.27 and openshift v3.11.0. Hopefully this gist contains the relevant details: https://gist.github.com/8702dae7b39320ebcd0dcbe84ed3798f

I think we’d all be eternally grateful if you could please tell us in detail what we’re doing wrong, @praveenkumar

Should we be using something other than v3.11.0?

I have a similar problem on oc cluster up

oc version oc v3.11.16 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.0.20:8443 kubernetes v1.11.0+d4cacc0

oc debug deployment/webconsole Defaulting container name to webconsole. Use ‘oc describe pod/webconsole-debug -n openshift-web-console’ to see all of the containers in this pod.

Debugging with pod/webconsole-debug, original command: /usr/bin/origin-web-console --audit-log-path=- --config=/var/webconsole-config/webconsole-config.yaml -v=0 Waiting for pod to start … If you don’t see a command prompt, try pressing enter. sh-4.2$ sh-4.2$ sh-4.2$ /usr/bin/origin-web-console --config=/var/webconsole-config/webconsole-config.yaml W1014 09:48:37.544135 13 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: “”: required to view aggregated container logs in the console, web console start will continue. W1014 09:48:37.544268 13 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: “”: required to view cluster metrics in the console, web console start will continue. Error: unable to load server certificate: open /var/serving-cert/tls.crt: permission denied

Found this, thread, https://github.com/openshift/origin-web-console-server/issues/37 and edited the deploy object for the webconsole to edit the permissions on the secret/volume to 444

oc edit deploy webconsole deployment.extensions/webconsole edited

and now it works.

oc get pods NAME READY STATUS RESTARTS AGE webconsole-69b58997df-8bsd2 0/1 Terminating 0 3h webconsole-69b58997df-vps67 1/1 Running 0 13m

oc logs webconsole-69b58997df-vps67 W1014 09:53:24.021302 1 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: “”: required to view aggregated container logs in the console, web console start will continue. W1014 09:53:24.021394 1 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: “”: required to view cluster metrics in the console, web console start will continue. I1014 09:53:24.189322 1 start.go:208] OpenShift Web Console Version: v3.11.16 I1014 09:53:24.189669 1 serve.go:89] Serving securely on 0.0.0.0:8443