rancher: Not able to create ingresses on amazon ec2 node driver clusters in an HA rancher on 1.20 cluster

Information about the Cluster Rancher Server Setup

  • Rancher version: v2.5.11
  • Installation option (Docker install/Helm Chart): Helm Chart
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1, v1.20.12-rancher1-2, 1.2.14
  • Proxy/Cert Details: Selfsigned

Information about the Cluster

  • Kubernetes version: v1.20.12
  • Cluster Type (Local/Downstream): Downstream node driver 1 worker, 1 etcd, 1 cp RKE1

Describe the bug Creation of ingress on an AWS node driver cluster created on a rancher HA server does not go through and errors out with Failed calling webhook error. More details in the result and additional info.

To Reproduce

  1. Create a rancher HA server on v2.5.11
  2. Create a downstream RKE1 node driver Amazon EC2 cluster with any node count
  3. From any project create a workload
  4. Create an ingress pointing to this workload

Result The ingress creation does not go through and it errors out

baseType: "error"
code: "InternalError"
message: "Internal error occurred: failed calling webhook \"validate.nginx.ingress.kubernetes.io\": Post \"https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s\": context deadline exceeded"
status: 500
type: "error"

Expected Result Expected the ingress creation to go through without any error.

Additional context

  1. This is not seen on node driver RKE DO clusters on HA or custom clusters
  2. Not seen on docker install DO,Amazon ec2 or custom clusters.
  3. Only seen on Amazon ec2 node driver HA.
  4. Also not seen on clusters on k8s v1.19.16

Errors seen in the rancher logs:



 7 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W1204 02:19:48.667464       7 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W1204 02:20:04.308015       7 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1204 02:22:02.682439       7 warnings.go:80] extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
W1204 02:26:15.307942       7 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition


2021/12/04 02:04:45 [ERROR] error syncing 'ingress-ip-domain': handler copy-settings: the server could not find the requested resource, requeuing
2021/12/04 02:04:45 [ERROR] error syncing 'install-uuid': handler copy-settings: the server could not find the requested resource, requeuing
2021/12/04 02:04:45 [ERROR] error syncing 'ingress-ip-domain': handler copy-settings: the server could not find the requested resource, requeuing
2021/12/04 02:04:45 [ERROR] error syncing 'install-uuid': handler copy-settings: the server could not find the requested resource, requeuing
2021/12/04 02:04:45 [ERROR] error syncing 'ingress-ip-domain': handler copy-settings: the server could not find the requested resource, requeuing

Errors seen in ingress-controller:

E1204 02:26:15.592042       6 server.go:77] "Failed to decode request body" err="couldn't get version/kind; json parse error: unexpected end of JSON input"
2021/12/04 02:26:16 http: TLS handshake error from :38364: read tcp 172.31.13.232:8443->:38364: read: connection reset by peer
E1204 02:26:17.685127       6 server.go:77] "Failed to decode request body" err="couldn't get version/kind; json parse error: unexpected end of JSON input"
E1204 02:26:17.758252       6 server.go:77] "Failed to decode request body" err="couldn't get version/kind; json parse error: unexpected end of JSON input"
 - - [04/Dec/2021:02:26:45 +0000] "GET /v3/connect/config HTTP/2.0" 200 19691 "-" "Go-http-client/2.0" 2902 0.010 [cattle-system-rancher-80] [] 10.42.0.6:80 19704 0.012 200 69048a76b18c5e7762850db3ecb19c5d
 - - [04/Dec/2021:02:27:40 +0000] "GET /v3/connect/config HTTP/2.0" 200 19708 "-" "Go-http-client/2.0" 2900 0.007 [cattle-system-rancher-80] [] 10.42.2.6:80 19721 0.008 200 73e8898aea23c402278561bdbec31eaa

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 16 (12 by maintainers)

Most upvoted comments

Pretty sure this is because the fix (https://github.com/rancher/rke/pull/2626) we implemented in RKE for this issue was not backported/changed scope as it was added for k8s 1.21 and up but the NGINX ingress version which was initially only used in k8s 1.21 and up was backported to older versions but the fix was not backported.

The fix here is to change the version scope from k8s to NGINX ingress and then scope it to>=0.48.0 (based on the templates) and backport it to Rancher 2.5/RKE 1.2

Workaround is to manually set the mode to hostPort:

ingress:
  network_mode: hostPort

I was able to reproduce this in rancher. The trick seems to be using a cluster with more than 2 nodes.

I was also able to reproduce this without rancher, with rke alone. So I think this issue should be transferred to the rke team.

Repro steps for RKE alone:

  1. Create two EC2 nodes
  2. Create an rke cluster with cluster.yml:
nodes:
    - address: <addr>.us-west-2.compute.amazonaws.com
      user: ubuntu
      role:
        - controlplane
        - etcd
      ssh_key_path: /home/colleen/Downloads/colleen-ec2.pem
    - address: <addr>.us-west-2.compute.amazonaws.com
      user: ubuntu
      role:
        - worker
      ssh_key_path: /home/colleen/Downloads/colleen-ec2.pem
services:
    etcd:
        snapshot: true
        creation: 6h
        retention: 24h
kubernetes_version: v1.20.13-rancher1-1
ingress:
    provider: nginx
  1. Create workload
$ kubectl create deploy nginx --image=nginx:latest --port=80
deployment.apps/nginx create
  1. Create service
$ kubectl create svc clusterip nginx --tcp=80:80
service/nginx created
  1. Create ingress
$ cat ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test1
  namespace: default
spec:
  rules:
  - host: test1.example.com
    http:
      paths:
      - backend:
          service:
            name: nginx
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific
$ kubectl apply -f ingress.yaml 
Error from server (InternalError): error when creating "ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s": context deadline exceeded

Also happens with extensions/v1beta1 and networking.k8s.io/v1beta:

$ cat ingress.yaml 
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test1
  namespace: default
spec:
  rules:
  - host: test1.example.com
    http:
      paths:
      - backend:
          serviceName: nginx
          servicePort: 80
        path: /
$ kubectl apply -f ingress.yaml 
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Error from server (InternalError): error when creating "ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1beta1/ingresses?timeout=10s": context deadline exceeded

Extending the timeout in the ingress-nginx-admission validatingwebhookconfiguration doesn’t help, nor does changing the apiGroups or apiVersions setting.

This issue seems to be known and unsolved in the upstream ingress-nginx controller: https://github.com/kubernetes/ingress-nginx/issues/5401

Workaround summary for release notes:

On custom clusters with two or more nodes provisioned using the Amazon EC2 infrastructure provider using default settings, a configuration problem causes creation of Ingress resources to fail. Options to work around this are either:

  1. Select a cluster version of 1.21 or greater or upgrade the cluster to version v1.21 or greater.
  2. In the EC2 node template, when configuring security groups, instead of selecting “Standard: Automatically create a rancher-nodes group”, manually create the security group in the AWS console following the documented port requirements and then additionally allowing incoming access to port 8443 from within the security group, and select that group in “Choose one or more existing groups” in Rancher.