nextdns: Memory leaking (when recovering from endpoint failures?)

Context

nextdns version 1.32.2
EdgeRouter X v2.0.9-hotfix.2

For days I’ve had a lot of problems with the NextDNS client failing. Lots of cached HTTP/2.0: doh resolve: context deadline exceeded in the logs. Starting yesterday, the process is getting killed by the OOM killer. This Issue is going to focus on that.I’ll open another to focus on why the failures are happening the first place.

It appears that when the following kind of dance occurs, the NextDNS resident memory size grows:

Jun  2 14:27:50 ubnt nextdns[14380]: message repeated 19 times: [ Received signal: broken pipe (ignored)]
Jun  2 14:27:50 ubnt nextdns[14380]: Endpoint provider failed: &{dns.nextdns.io. https://dns.nextdns.io#45.90.28.0,2a07:a8c0::,45.90.30.0,2a07:a8c1::}: exchange: roundtrip: unexpected EOF
Jun  2 14:27:53 ubnt nextdns[14380]: Connected 104.238.181.28:443 (con=11ms tls=1472ms, TCP, TLS13)

After enough of those, the process eventually is killed:

Jun  2 16:04:00 ubnt kernel: nextdns invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Jun  2 16:04:00 ubnt kernel: CPU: 3 PID: 14383 Comm: nextdns Tainted: P           O    4.14.54-UBNT #1
Jun  2 16:04:00 ubnt kernel: Out of memory: Kill process 14380 (nextdns) score 525 or sacrifice child
Jun  2 16:04:00 ubnt kernel: Killed process 14380 (nextdns) total-vm:671096kB, anon-rss:136880kB, file-rss:0kB, shmem-rss:0kB
Jun  2 16:04:00 ubnt kernel: Process 14383 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14383 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0
Jun  2 16:04:00 ubnt kernel: Process 14392 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Process 14441 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14441 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0
Jun  2 16:04:00 ubnt kernel: Process 14439 (nextdns) has crashed (parent 1 (systemd) signal 11, code 128, addr   (null)), preparing coredump
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14439 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0
Jun  2 16:04:00 ubnt kernel: Error while handling coredump for 14392 (nextdns): coredump_wait(siginfo->si_signo, &core_state) < 0

How fast this occurs seems related to how much DNS traffic is happening on my network. Last night I restarted things before bed and it took about 4 hours. This morning after restarting, it took about 90 minutes.

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 20 (7 by maintainers)

Most upvoted comments

Is this issue already fixed? I still got this problem on cli 1.37.2. The memory usage grows about 35 MB just in a minute because of endpoint failure (?). The log is full of cache fallback HTTP/2.0: doh resolve: context deadline exceeded.

andrew-susanto on Oct 9, 2021

Please try the 1.32.3.

rs on Jun 2, 2021