ingress-nginx: unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):


Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

NGINX Ingress controller version: 0.24.1

Kubernetes version (use kubectl version): v1.14.1

Environment:

  • Cloud provider or hardware configuration: vmware evironment
  • OS (e.g. from /etc/os-release): ubuntu 18.04.1
  • Kernel (e.g. uname -a): 4.15.0-47-generic
  • Install tools:
  • Others:

What happened: I deploy ingress-nginx,then it is wrong NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-5694ccb578-csn72 0/1 CrashLoopBackOff 8 20m 10.244.1.141 node2 <none> <none>

this is a part of log W0503 09:23:37.549224 1 flags.go:214] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) nginx version: nginx/1.15.6 W0503 09:23:37.559659 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0503 09:23:37.564858 1 main.go:205] Creating API client for https://10.96.0.1:443 I0503 09:23:37.618028 1 main.go:249] Running in Kubernetes cluster version v1.14 (v1.14.1) - git (clean) commit b7394102d6ef778017f2ca4046abbaa23b88c290 - platform linux/amd64 F0503 09:23:37.925586 1 main.go:121] unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know: node1 is master, node2 is generic,I apply yaml file,then ingress-nginx pod was deploy in node2,it is relative to host permission?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 17 (1 by maintainers)

Commits related to this issue

Most upvoted comments

Solution: Change runAsUser: 33 to runAsUser: 101.

In my case I used helm upgrade command without specifying the chart version, which caused that I used chart for nginx-ingress version 0.27.x with image version 0.26.2.

There is a breaking change in the default of runAsUser attribute due to migration to Alpine linux.

root@node3:/home/z/kubeadm/ingress-nginx# kubectl get pods -n ingress-nginx                                                     NAME                                        READY   STATUS             RESTARTS   AGE
nginx-ingress-controller-5694ccb578-mzb4v   0/1     CrashLoopBackOff   6          6m57s
root@node3:/home/z/kubeadm/ingress-nginx# kubectl logs nginx-ingress-controller-5694ccb578-mzb4v -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.24.1
  Build:      ce418168f
  Repository: https://github.com/kubernetes/ingress-nginx
-------------------------------------------------------------------------------

W0515 14:31:45.066437       1 flags.go:214] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
nginx version: nginx/1.15.6
W0515 14:31:45.095746       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0515 14:31:45.096599       1 main.go:205] Creating API client for https://10.96.0.1:443
I0515 14:31:45.195979       1 main.go:249] Running in Kubernetes cluster version v1.14 (v1.14.1) - git (clean) commit b7394102d6ef778017f2ca4046abbaa23b88c290 - platform linux/amd64
F0515 14:31:46.356556       1 main.go:121] unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied

I am using k8s.gcr.io/ingress-nginx/controller:v1.2.0 Kubernetes V1.23.6 / v1.24.0 (tested on the latter but should be working on both since 1.22)

Only thing that solved the issue for me was setting the securityContext of the .spec.template.spec.containers[].securityContext to default and adding to the Deployment’s .spec.template.spec.securityContext this:

sysctls:
  - name: net.ipv4.ip_unprivileged_port_start
    value: "1"

That way it is not needed to use the runAsUser: 0 for the *:80 and *:443 ports to work and also you can work with the pod as intended (non-root)

Edit: it also allows the creation of the /etc/ingress-controller/ssl/default-fake-certificate.pem

Helm Chart Version: 2.13.0 Nginx Ingress Controller Version: v0.35.0 Kubernetes Version: v1.15.12 Docker Version: v18.09.9

Controller was deployed about 300d ago w/o any interruption, and, then, suddenly, the deployment/pod started failing with the initial error described.

It is able to partially start with runAsUser set to 0 (root); however, it eventually fails trying to chown a tmp file.

I0726 20:14:08.897721       7 main.go:105] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
I0726 20:14:08.905766       7 ssl.go:528] loading tls certificate from certificate path /usr/local/certificates/cert and key path /usr/local/certificates/key
I0726 20:14:08.946139       7 nginx.go:263] Starting NGINX Ingress controller
...
Error: exit status 1
nginx: the configuration file /tmp/nginx-cfg213950217 syntax is ok
2021/07/26 20:14:16 [emerg] 55#55: chown("/tmp/client-body", 101) failed (1: Operation not permitted)
nginx: [emerg] chown("/tmp/client-body", 101) failed (1: Operation not permitted)
nginx: configuration file /tmp/nginx-cfg213950217 test failed
/etc/nginx # stat /tmp/client-body
  File: /tmp/client-body
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 300020h/3145760d	Inode: 12389077    Links: 2
Access: (0700/drwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-07-26 20:18:27.000000000
Modify: 2021-07-26 20:17:42.000000000
Change: 2021-07-26 20:17:42.000000000

If I add the CHOWN capability to the securityContext, exec into the pod, and then perform a chown -R 101:101 /etc/ingress-controller, things start flowing temporarily, but, then, it fails loading again shortly thereafter:

I0727 15:39:53.689868       7 status.go:275] updating Ingress ... status from [{10.1.0.61 }] to []
I0727 15:39:55.891559       7 nginx.go:388] Stopping admission controller
I0727 15:39:55.891637       7 nginx.go:396] Stopping NGINX process
E0727 15:39:55.891675       7 nginx.go:329] http: Server closed
2021/07/27 15:39:55 [emerg] 73#73: cannot load certificate "/etc/ingress-controller/ssl/default-fake-certificate.pem": BIO_new_file() failed (SSL: error:0200100D:system library:fopen:Permission denied:fopen('/etc/ingress-controller/ssl/default-fake-certificate.pem','r') error:2006D002:BIO routines:BIO_new_file:system lib)

To further workaround this, I added SETGID and SETUID capabilities to the securityContext as well.

        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            - CHOWN
            - SETGID
            - SETUID
            drop:
            - ALL
          runAsUser: 0

The deployment is finally “fixed”. Any other combination results in the initial failure still. What caused this deployment to go haywire?

For what it’s worth, I ran into the very same issue using containerd from EPEL on CentOS 7.6 (containerd-1.2.1-1.el7). Before that, I ran into an issue with nginx being denied to bind to 0.0.0.0:80 which I could resolve by running the process as UID 0.

All of this hinted at issues with ACLs or xattrs on the binary, the cert directory,… so I ran a Google query and bumped into https://github.com/containerd/containerd/issues/2942

Indeed, removing the images from the system, then upgrading to containerd-1.2.4-1.fc30 (it’s a static Go binary after all…) made the controller container start just fine after (re)pulling the image.

So, if your environment is using containerd (could be as part of Docker? We don’t run Docker, plain CRI to containerd) affected by that bug, you may want to upgrade and try again.