minishift: Webconsole Crashloop and Permission Issue
General information
- Minishift version: minishift v1.24.0+8a904d0 /minishift v1.23
- OS: macOS
- Hypervisor: xhyve
Steps to reproduce
- minishift profile set demo ( with memory 8GB/ vcpus 4)
- minishift start
- minishift console
Expected
The minishift console should be opened in browser
Actual
The browser opens the OpenShift console url but with HTTP 404. Doing oc project openshift-web-console , oc get pods
NAME READY STATUS RESTARTS AGE
webconsole-6df7dd6b7b-6msxj 0/1 CrashLoopBackOff 8 18m
Logs
oc logs <webconsole pod>
W0917 15:45:19.252521 1 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, web console start will continue.
W0917 15:45:19.252616 1 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, web console start will continue.
Error: unable to load server certificate: open /var/serving-cert/tls.crt: permission denied
Usage:
origin-web-console [flags]
Flags:
--alsologtostderr log to standard error as well as files
--audit-log-format string Format of saved audits. "legacy" indicates 1-line text format for each event. "json" indicates structured json format. Requires the 'AdvancedAuditing' feature gate. Known formats are legacy,json. (default "json")
--audit-log-maxage int The maximum number of days to retain old audit log files based on the timestamp encoded in their filename.
--audit-log-maxbackup int The maximum number of old audit log files to retain.
--audit-log-maxsize int The maximum size in megabytes of the audit log file before it gets rotated.
--audit-log-path string If set, all requests coming to the apiserver will be logged to this file. '-' means standard out.
--audit-policy-file string Path to the file that defines the audit policy configuration. Requires the 'AdvancedAuditing' feature gate. With AdvancedAuditing, a profile is required to enable auditing.
--audit-webhook-batch-buffer-size int The size of the buffer to store events before batching and sending to the webhook. Only used in batch mode. (default 10000)
--audit-webhook-batch-initial-backoff duration The amount of time to wait before retrying the first failed requests. Only used in batch mode. (default 10s)
--audit-webhook-batch-max-size int The maximum size of a batch sent to the webhook. Only used in batch mode. (default 400)
--audit-webhook-batch-max-wait duration The amount of time to wait before force sending the batch that hadn't reached the max size. Only used in batch mode. (default 30s)
--audit-webhook-batch-throttle-burst int Maximum number of requests sent at the same moment if ThrottleQPS was not utilized before. Only used in batch mode. (default 15)
--audit-webhook-batch-throttle-qps float32 Maximum average number of requests per second. Only used in batch mode. (default 10)
--audit-webhook-config-file string Path to a kubeconfig formatted file that defines the audit webhook configuration. Requires the 'AdvancedAuditing' feature gate.
--audit-webhook-mode string Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the webhook to buffer and send events asynchronously. Known modes are batch,blocking. (default "batch")
--config string filename containing the WebConsoleConfig
--contention-profiling Enable lock contention profiling, if profiling is enabled
--enable-swagger-ui Enables swagger ui on the apiserver at /swagger-ui
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--logtostderr log to standard error instead of files (default true)
--profiling Enable profiling via web interface host:port/debug/pprof/ (default true)
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
-v, --v Level log level for V logs
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
F0917 15:45:19.253264 1 console.go:35] unable to load server certificate: open /var/serving-cert/tls.crt: permission denied
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 96 (56 by maintainers)
Commits related to this issue
- Change file permissions on console serving cert Have a look to https://github.com/openshift/origin-web-console-server/issues/37 and https://github.com/openshift/openshift-ansible/pull/8558/files , Th... — committed to praveenkumar/origin by praveenkumar 6 years ago
- Change file permissions on console serving cert Have a look to https://github.com/openshift/origin-web-console-server/issues/37 and https://github.com/openshift/openshift-ansible/pull/8558/files ... — committed to praveenkumar/origin by praveenkumar 6 years ago
- Change file permissions on console serving cert Have a look to https://github.com/openshift/origin-web-console-server/issues/37 and https://github.com/openshift/openshift-ansible/pull/8558/files ... — committed to praveenkumar/origin by praveenkumar 6 years ago
- Change file permissions on console serving cert Have a look to https://github.com/openshift/origin-web-console-server/issues/37 and https://github.com/openshift/openshift-ansible/pull/8558/files ... — committed to praveenkumar/origin by praveenkumar 6 years ago
- Issue #2809 Apply anyuid scc only to myproject namespace — committed to anjannath/minishift by anjannath 5 years ago
I spent some time with @jcrossley3 and I believe we have a better understanding of what’s happening now.
TLDR: there seems to be a race that can result in the webconsole pod sometimes being matched against the
anyuidscc when the anyuid addon is enabled.As evidence, I submit the following:
Crashing pod:
Working pod:
Differences between working and not working:
The working pod is matched against the
restrictedSCC, which has the following effects:runAsUseris set to an allocated UID for the containersselinuxOptionsis set on the pod-level securityContextfsGroupis set on pod-level securityContextThe non-working pod is matched against the
anyuidSCC, which means that:selinuxOptionsis not setrunAsUseris not setfsGroupis not setThe origin-web-console image sets
USERto 1001. Something about the permissions required is incorrect when the pod is matched againstanyuidSCC.Note: in both of these scenarios, the fix for openshift/origin#21250 is not applied; ie, the
defaultModeis still set on the pod’s volumes. This tells me that theocused to stand up the webconsole does not have the fix.The real question here is why the web console pod is sometimes correctly matched against
restrictedand sometimes incorrectly matched againstanyuid. I characterize the match againstanyuidas incorrect because the pod descriptor being created does not use any features that would warrant being matched againstanyuid. That points to a race condition in the SCC admission controller.Third time is the charm:
This is documented as recorded as a known issue. It is an addon that causes more harm than convience
On Sun, Mar 8, 2020 at 11:28 AM andresmmujica notifications@github.com wrote:
–
Gerard Braad | http://gbraad.nl [ Doing Open Source Matters ]
This should be recorded as a Known Issue for the release: anyuid can cause Permission Denied issues on startup. If this happens, please disable the addon and apply after the cluster deployment succeeded when necessary.
@kameshsampath I suggest you to disable this addon for the time being, and only use
applyto enable the addon after the cluster came up. but allow enough time for the webconsole to come up. We are looking into a fix, but I think for the time being this is going to be documented as a known issue./cc: @robin-owen @LalatenduMohanty @praveenkumar
just came back from lunch and will run it some additional times (made a copy of the failing vm). so, at the moment we both run at least the same versions… and both have
anyuidenabled.YAY! We’re not crazy! Finally! 😃
Hmmm… I also get now:
this aren’t the image IDs I got before. interesting… maybe related to Docker hub and their caching… but 2 days already… ? hmmm… anyway, web-console is
Runninghappily for6m:I do not have addons enabled, so I will run with…
anyuidtends to change behaviour quite a bit.If I compare the
ocbinary downloaded by minishift and the one I can download from origin releases (diff ~/.minishift/cache/oc/v3.11.0/linux/oc /usr/bin/oc), I see no difference.And since the patch has been merged on November 1st and the release is from October, it would never work.
I wonder if compiling a
ocbinary from the OpenShift 3.11.x branch and replacing the one downloaded by minishift would finally solve this issue…Hi folks, I’ve erased
.minishiftand didminishift delete && minishift start8 times this morning before having the web-console running. It randomly works.Is the patch included in origin tag
v3.11.0?I just blew away my
~/.minishiftand my webconsole is still crashlooping with minishift 1.27 and openshift v3.11.0. Hopefully this gist contains the relevant details: https://gist.github.com/8702dae7b39320ebcd0dcbe84ed3798fI think we’d all be eternally grateful if you could please tell us in detail what we’re doing wrong, @praveenkumar
Should we be using something other than v3.11.0?
I have a similar problem on oc cluster up
oc version oc v3.11.16 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://192.168.0.20:8443 kubernetes v1.11.0+d4cacc0
oc debug deployment/webconsole Defaulting container name to webconsole. Use ‘oc describe pod/webconsole-debug -n openshift-web-console’ to see all of the containers in this pod.
Debugging with pod/webconsole-debug, original command: /usr/bin/origin-web-console --audit-log-path=- --config=/var/webconsole-config/webconsole-config.yaml -v=0 Waiting for pod to start … If you don’t see a command prompt, try pressing enter. sh-4.2$ sh-4.2$ sh-4.2$ /usr/bin/origin-web-console --config=/var/webconsole-config/webconsole-config.yaml W1014 09:48:37.544135 13 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: “”: required to view aggregated container logs in the console, web console start will continue. W1014 09:48:37.544268 13 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: “”: required to view cluster metrics in the console, web console start will continue. Error: unable to load server certificate: open /var/serving-cert/tls.crt: permission denied
Found this, thread, https://github.com/openshift/origin-web-console-server/issues/37 and edited the deploy object for the webconsole to edit the permissions on the secret/volume to 444
oc edit deploy webconsole deployment.extensions/webconsole edited
and now it works.
oc get pods NAME READY STATUS RESTARTS AGE webconsole-69b58997df-8bsd2 0/1 Terminating 0 3h webconsole-69b58997df-vps67 1/1 Running 0 13m
oc logs webconsole-69b58997df-vps67 W1014 09:53:24.021302 1 start.go:93] Warning: config.clusterInfo.loggingPublicURL: Invalid value: “”: required to view aggregated container logs in the console, web console start will continue. W1014 09:53:24.021394 1 start.go:93] Warning: config.clusterInfo.metricsPublicURL: Invalid value: “”: required to view cluster metrics in the console, web console start will continue. I1014 09:53:24.189322 1 start.go:208] OpenShift Web Console Version: v3.11.16 I1014 09:53:24.189669 1 serve.go:89] Serving securely on 0.0.0.0:8443