ingress-nginx: Potential memory leak in OpenSSL
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 0.44.0 & 0.49.0
Kubernetes version (use kubectl version
): 1.18.8
Environment:
- Cloud provider or hardware configuration: AlibabaCloud
- OS (e.g. from /etc/os-release): Alibaba Cloud Linux (Aliyun Linux)
- Kernel (e.g.
uname -a
): 4.19.91-23.al7.x86_64 - Install tools:
- From AlibabaCloud console
What happened:
We’ve encountered some memory issue both in 0.44.0 and 0.49.0 Some of the ingress pods get a high memory usage, but others are ina normal level
We did sone diagnose to the pod, and it shows that one of the nginx worker gained a large amount of memory.
the income traffic is balance, about 100 requests per second, and the connection count between pods is of the same order of magnitude (from 10k+ to 100k+).
And then, we use pmap -x <pid>
to get details of the memory. There were lots of tiny anon blocks in the memory map.
Made a coredump and took a look at this memory area, most of its content seems to be related to TLS certs. And also we tried to run memleak on the process, and result here:
[16:18:49] Top 10 stacks with outstanding allocations:
300580 bytes in 15029 allocations from stack
CRYPTO_strdup+0x30 [libcrypto.so.1.1]
[unknown]
462706 bytes in 375 allocations from stack
[unknown] [libcrypto.so.1.1]
507864 bytes in 9069 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[unknown]
536576 bytes in 131 allocations from stack
[unknown] [libcrypto.so.1.1]
848638 bytes in 333 allocations from stack
ngx_alloc+0xf [nginx]
[unknown]
2100720 bytes in 22253 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
3074792 bytes in 888 allocations from stack
BUF_MEM_grow+0x81 [libcrypto.so.1.1]
3496960 bytes in 4398 allocations from stack
posix_memalign+0x1a [ld-musl-x86_64.so.1]
5821440 bytes in 9096 allocations from stack
[unknown] [libssl.so.1.1]
[unknown]
9060080 bytes in 22605 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[16:18:58] Top 10 stacks with outstanding allocations:
287280 bytes in 14364 allocations from stack
CRYPTO_strdup+0x30 [libcrypto.so.1.1]
[unknown]
393216 bytes in 96 allocations from stack
[unknown] [libcrypto.so.1.1]
396428 bytes in 322 allocations from stack
[unknown] [libcrypto.so.1.1]
486080 bytes in 8680 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[unknown]
724916 bytes in 286 allocations from stack
ngx_alloc+0xf [nginx]
[unknown]
1949832 bytes in 20300 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
2032380 bytes in 727 allocations from stack
BUF_MEM_grow+0x81 [libcrypto.so.1.1]
3760256 bytes in 5049 allocations from stack
posix_memalign+0x1a [ld-musl-x86_64.so.1]
5575680 bytes in 8712 allocations from stack
[unknown] [libssl.so.1.1]
[unknown]
8525968 bytes in 20572 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[16:19:06] Top 10 stacks with outstanding allocations:
716420 bytes in 35821 allocations from stack
CRYPTO_strdup+0x30 [libcrypto.so.1.1]
[unknown]
782336 bytes in 191 allocations from stack
[unknown] [libcrypto.so.1.1]
885218 bytes in 721 allocations from stack
[unknown] [libcrypto.so.1.1]
1233680 bytes in 22030 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
[unknown]
1761982 bytes in 775 allocations from stack
ngx_alloc+0xf [nginx]
[unknown]
3814396 bytes in 1525 allocations from stack
BUF_MEM_grow+0x81 [libcrypto.so.1.1]
4298576 bytes in 48880 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
11922816 bytes in 15455 allocations from stack
posix_memalign+0x1a [ld-musl-x86_64.so.1]
14005760 bytes in 21884 allocations from stack
[unknown] [libssl.so.1.1]
[unknown]
21036912 bytes in 49333 allocations from stack
CRYPTO_zalloc+0xa [libcrypto.so.1.1]
here are more samples m.log
Finally we moved the cert to the load balancer provided by cloud, and it’s working fine now, but still have no clue about why could this happen.
The leak is happened on nginx and connection with TLS. We tried to rebuild the image to upgrade libraries to the newest version (for openssl, 1.1.1l-r0), but it doesn’t work.
What you expected to happen:
no memory leak with TLS
How to reproduce it:
I have no idea what makes the issue happen, and I can’t reproduce it on another cluster.
Anything else we need to know:
As far, we haven’t met this issue with 0.30.0 (openssl 1.1.1d-r3), I don’t know whether it’s a problem in newer openssl.
/kind bug
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (10 by maintainers)
Hi, I have the same issue:
nginx -s reload
temporary solves the issue.Here is my infos:
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
NGINX Ingress controller Release: v0.47.0 Build: 7201e37633485d1f14dbe9cd7b22dd380df00a07 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.20.1
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:“1”, Minor:“21”, GitVersion:“v1.21.2”, GitCommit:“092fbfbf53427de67cac1e9fa54aaa09a28371d7”, GitTreeState:“clean”, BuildDate:“2021-06-16T12:59:11Z”, GoVersion:“go1.16.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“20+”, GitVersion:“v1.20.9-gke.1001”, GitCommit:“1fe18c314ed577f6047d2712a9d1c8e498e22381”, GitTreeState:“clean”, BuildDate:“2021-08-23T23:06:28Z”, GoVersion:“go1.15.13b5”, Compiler:“gc”, Platform:“linux/amd64”}
Environment:
uname -a
): Linux ingress-nginx-controller-788c5f7f88-d94pj 5.4.120+ #1 SMP Tue Jun 22 14:53:20 PDT 2021 x86_64 LinuxHelm: helm -n ingress-nginx get values ingress-nginx USER-SUPPLIED VALUES:
kubectl describe po -n ingress-nginx ingress-nginx-controller-788c5f7f88-d94pj
kubectl describe svc -n ingress-nginx ingress-nginx-controller
+1 still happening