ingress-nginx: Potential memory leak in OpenSSL

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 0.44.0 & 0.49.0

Kubernetes version (use kubectl version): 1.18.8

Environment:

  • Cloud provider or hardware configuration: AlibabaCloud
  • OS (e.g. from /etc/os-release): Alibaba Cloud Linux (Aliyun Linux)
  • Kernel (e.g. uname -a): 4.19.91-23.al7.x86_64
  • Install tools:
    • From AlibabaCloud console

What happened:

We’ve encountered some memory issue both in 0.44.0 and 0.49.0 Some of the ingress pods get a high memory usage, but others are ina normal level

image

We did sone diagnose to the pod, and it shows that one of the nginx worker gained a large amount of memory.

image

the income traffic is balance, about 100 requests per second, and the connection count between pods is of the same order of magnitude (from 10k+ to 100k+).

And then, we use pmap -x <pid> to get details of the memory. There were lots of tiny anon blocks in the memory map.

image

Made a coredump and took a look at this memory area, most of its content seems to be related to TLS certs. And also we tried to run memleak on the process, and result here:

[16:18:49] Top 10 stacks with outstanding allocations:
	300580 bytes in 15029 allocations from stack
		CRYPTO_strdup+0x30 [libcrypto.so.1.1]
		[unknown]
	462706 bytes in 375 allocations from stack
		[unknown] [libcrypto.so.1.1]
	507864 bytes in 9069 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
		[unknown]
	536576 bytes in 131 allocations from stack
		[unknown] [libcrypto.so.1.1]
	848638 bytes in 333 allocations from stack
		ngx_alloc+0xf [nginx]
		[unknown]
	2100720 bytes in 22253 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
	3074792 bytes in 888 allocations from stack
		BUF_MEM_grow+0x81 [libcrypto.so.1.1]
	3496960 bytes in 4398 allocations from stack
		posix_memalign+0x1a [ld-musl-x86_64.so.1]
	5821440 bytes in 9096 allocations from stack
		[unknown] [libssl.so.1.1]
		[unknown]
	9060080 bytes in 22605 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[16:18:58] Top 10 stacks with outstanding allocations:
	287280 bytes in 14364 allocations from stack
		CRYPTO_strdup+0x30 [libcrypto.so.1.1]
		[unknown]
	393216 bytes in 96 allocations from stack
		[unknown] [libcrypto.so.1.1]
	396428 bytes in 322 allocations from stack
		[unknown] [libcrypto.so.1.1]
	486080 bytes in 8680 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
		[unknown]
	724916 bytes in 286 allocations from stack
		ngx_alloc+0xf [nginx]
		[unknown]
	1949832 bytes in 20300 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
	2032380 bytes in 727 allocations from stack
		BUF_MEM_grow+0x81 [libcrypto.so.1.1]
	3760256 bytes in 5049 allocations from stack
		posix_memalign+0x1a [ld-musl-x86_64.so.1]
	5575680 bytes in 8712 allocations from stack
		[unknown] [libssl.so.1.1]
		[unknown]
	8525968 bytes in 20572 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[16:19:06] Top 10 stacks with outstanding allocations:
	716420 bytes in 35821 allocations from stack
		CRYPTO_strdup+0x30 [libcrypto.so.1.1]
		[unknown]
	782336 bytes in 191 allocations from stack
		[unknown] [libcrypto.so.1.1]
	885218 bytes in 721 allocations from stack
		[unknown] [libcrypto.so.1.1]
	1233680 bytes in 22030 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
		[unknown]
	1761982 bytes in 775 allocations from stack
		ngx_alloc+0xf [nginx]
		[unknown]
	3814396 bytes in 1525 allocations from stack
		BUF_MEM_grow+0x81 [libcrypto.so.1.1]
	4298576 bytes in 48880 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]
	11922816 bytes in 15455 allocations from stack
		posix_memalign+0x1a [ld-musl-x86_64.so.1]
	14005760 bytes in 21884 allocations from stack
		[unknown] [libssl.so.1.1]
		[unknown]
	21036912 bytes in 49333 allocations from stack
		CRYPTO_zalloc+0xa [libcrypto.so.1.1]

here are more samples m.log

Finally we moved the cert to the load balancer provided by cloud, and it’s working fine now, but still have no clue about why could this happen.

The leak is happened on nginx and connection with TLS. We tried to rebuild the image to upgrade libraries to the newest version (for openssl, 1.1.1l-r0), but it doesn’t work.

What you expected to happen:

no memory leak with TLS

How to reproduce it:

I have no idea what makes the issue happen, and I can’t reproduce it on another cluster.

Anything else we need to know:

As far, we haven’t met this issue with 0.30.0 (openssl 1.1.1d-r3), I don’t know whether it’s a problem in newer openssl.

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (10 by maintainers)

Most upvoted comments

Hi, I have the same issue:

Capture d’écran 2021-09-25 à 19 50 09

nginx -s reload temporary solves the issue.

Here is my infos:

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):


NGINX Ingress controller Release: v0.47.0 Build: 7201e37633485d1f14dbe9cd7b22dd380df00a07 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.20.1


Kubernetes version (use kubectl version):

Client Version: version.Info{Major:“1”, Minor:“21”, GitVersion:“v1.21.2”, GitCommit:“092fbfbf53427de67cac1e9fa54aaa09a28371d7”, GitTreeState:“clean”, BuildDate:“2021-06-16T12:59:11Z”, GoVersion:“go1.16.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“20+”, GitVersion:“v1.20.9-gke.1001”, GitCommit:“1fe18c314ed577f6047d2712a9d1c8e498e22381”, GitTreeState:“clean”, BuildDate:“2021-08-23T23:06:28Z”, GoVersion:“go1.15.13b5”, Compiler:“gc”, Platform:“linux/amd64”}

Environment:

  • Cloud provider or hardware configuration: GCP
  • Kernel (e.g. uname -a): Linux ingress-nginx-controller-788c5f7f88-d94pj 5.4.120+ #1 SMP Tue Jun 22 14:53:20 PDT 2021 x86_64 Linux

Helm: helm -n ingress-nginx get values ingress-nginx USER-SUPPLIED VALUES:

controller:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - nginx-ingress
          topologyKey: kubernetes.io/hostname
        weight: 100
  config:
    use-gzip: true
  metrics:
    enabled: true
    serviceMonitor:
      additionalLabels:
        release: kube-prometheus-stack
      enabled: true
      namespace: monitoring
  replicaCount: 2
  resources:
    requests:
      memory: 800Mi
  service:
    externalTrafficPolicy: Local

kubectl describe po -n ingress-nginx ingress-nginx-controller-788c5f7f88-d94pj

Name:         ingress-nginx-controller-788c5f7f88-d94pj
Namespace:    ingress-nginx
Priority:     0
Node:         gke-production-pool-1-66bb3111-sldn/10.132.0.4
Start Time:   Sat, 18 Sep 2021 17:17:13 +0200
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              pod-template-hash=788c5f7f88
Annotations:  kubectl.kubernetes.io/restartedAt: 2021-09-18T17:17:13+02:00
Status:       Running
IP:           10.52.3.39
IPs:
  IP:           10.52.3.39
Controlled By:  ReplicaSet/ingress-nginx-controller-788c5f7f88
Containers:
  controller:
    Container ID:  containerd://74fb58bce33d84fb54fb61a3a16772d6edf8858cc14a05c21d0feb79a90e8157
    Image:         k8s.gcr.io/ingress-nginx/controller:v0.47.0@sha256:a1e4efc107be0bb78f32eaec37bef17d7a0c81bec8066cdf2572508d21351d0b
    Image ID:      k8s.gcr.io/ingress-nginx/controller@sha256:a1e4efc107be0bb78f32eaec37bef17d7a0c81bec8066cdf2572508d21351d0b
    Ports:         80/TCP, 443/TCP, 10254/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
      --election-id=ingress-controller-leader
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Sat, 18 Sep 2021 17:17:14 +0200
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   800Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-788c5f7f88-d94pj (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from ingress-nginx-token-cn2nx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  ingress-nginx-token-cn2nx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-token-cn2nx
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

kubectl describe svc -n ingress-nginx ingress-nginx-controller

Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/version=0.47.0
                          helm.sh/chart=ingress-nginx-3.34.0
Annotations:              cloud.google.com/neg: {"ingress":true}
                          meta.helm.sh/release-name: ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Families:              <none>
IP:                       10.56.2.89
IPs:                      10.56.2.89
LoadBalancer Ingress:     xxx.xxx.xxx.xxx
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31463/TCP
Endpoints:                10.52.3.39:80,10.52.4.31:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  30186/TCP
Endpoints:                10.52.3.39:443,10.52.4.31:443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     30802
Events:                   <none>

+1 still happening