netty: Netty DNS Resolver cannot handle DNS Response with A, AAAA and NS records: SearchDomainUnknownHostException

Context

We are using Netty in our Java applications running on Azure Container Apps (which runs on top of Kubernetes internally). Our application regularly calls a dependency, always using the same request. Our DNS Servers sometimes return just an A record, which is correctly handled by Netty, but sometimes they return additional information:

Standard query response 0x27cc A <URL> CNAME <URL-2> A <IP> NS b.root-servers.net NS m.root-servers.net NS d.root-servers.net NS h.root-servers.net NS f.root-servers.net NS c.root-servers.net NS g.root-servers.net NS i.root-servers.net NS e.root-servers.net NS a.root-servers.net NS k.root-servers.net NS l.root-servers.net NS j.root-servers.net A <IP> AAAA <IP> AAAA <IP> A <IP> AAAA <IP> AAAA <IP> AAAA <IP> AAAA <IP> A <IP> AAAA <IP> A <IP> A <IP> AAAA <IP> A <IP> AAAA <IP> AAAA <IP> A <IP> A <IP> A <IP> A <IP> OPT

The additional section should be ignored and the answer section should be correctly parsed.

Actual behavior

The longer DNS response produces the following Netty error:

 io.netty.resolver.dns.DnsResolveContext$SearchDomainUnknownHostException: Failed to resolve 'xyz.xyz-xyz.location.azurecontainerapps.io' [A(1)] and search domain query for configured domains failed as well: [k8se-apps.svc.cluster.local, svc.cluster.local, cluster.local]
	at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:1088)
	at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1035)
	at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:422)
	at io.netty.resolver.dns.DnsResolveContext.access$700(DnsResolveContext.java:66)
	at io.netty.resolver.dns.DnsResolveContext$2.operationComplete(DnsResolveContext.java:493)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)
	at io.netty.resolver.dns.DnsQueryContext.tryFailure(DnsQueryContext.java:261)
	at io.netty.resolver.dns.DnsQueryContext$4.run(DnsQueryContext.java:208)
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.resolver.dns.DnsNameResolverTimeoutException: [32843: /127.0.0.11:53] DefaultDnsQuestion(xyz.xyz-xyz.location.azurecontainerapps.io. IN A) query '32843' via UDP timed out after 5000 milliseconds (no stack trace available)

Expected behavior

The DNS resolution should work without error.

Steps to reproduce

Simulate the above DNS response packet and pass to Netty.

OS version

ubuntu 22.04

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 40 (21 by maintainers)

Most upvoted comments

The links in #13705 are very interesting: especially the weavework blog post with the problem description:

A problem occurs when two UDP packets are sent via the same socket at the same time from different threads.

UDP is a connection-less protocol, so no packet is sent as a result of the connect(2) syscall (opposite to TCP) and thus, no conntrack entry has been created after the call

This would explain what we can see in my previous comment: Netty reporting two DNS UDP writes but only one being captured in the tcpdump