istio: Unable to make TCP DNS queries on one cluster, not others, until DestinationRule recreated
Hi, We have three clusters, dev, preprod and prod, which are all built identically (same Infra as Code etc). On one of them (preprod), applications that are behind Istio are unable to make TCP DNS queries:
[atcloud@airflow-web-5d9984d64-2pqpm app]$ dig +tcp www.google.com
;; communications error to 10.192.32.10#53: connection reset
Looking at istio debug logs, there’s nothing particularly telling:
[2019-02-11 08:07:38.552][28][debug][filter] external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2019-02-11 08:07:38.552][28][debug][filter] src/envoy/tcp/mixer/filter.cc:28] Called tcp filter: Filter
[2019-02-11 08:07:38.552][28][debug][filter] src/envoy/tcp/mixer/filter.cc:38] Called tcp filter: initializeReadFilterCallbacks
[2019-02-11 08:07:38.552][28][debug][filter] external/envoy/source/common/tcp_proxy/tcp_proxy.cc:168] [C423] new tcp proxy session
[2019-02-11 08:07:38.552][28][debug][filter] src/envoy/tcp/mixer/filter.cc:98] [C423] Called tcp filter onNewConnection: remote 10.198.13.22:49261, local 10.192.32.10:53
[2019-02-11 08:07:38.552][28][debug][filter] external/envoy/source/common/tcp_proxy/tcp_proxy.cc:305] [C423] Creating connection to cluster outbound|53||kube-dns.kube-system.svc.cluster.local
[2019-02-11 08:07:38.552][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:98] [C423] closing data_to_write=0 type=1
[2019-02-11 08:07:38.552][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C423] closing socket: 1
[2019-02-11 08:07:38.552][28][debug][filter] src/envoy/tcp/mixer/filter.cc:134] Called tcp filter onEvent: 1
[2019-02-11 08:07:38.552][28][debug][main] external/envoy/source/server/connection_handler_impl.cc:218] [C423] new connection
[2019-02-11 08:07:38.552][28][debug][filter] src/envoy/tcp/mixer/filter.cc:33] Called tcp filter : ~Filter
[2019-02-11 08:07:39.746][23][debug][main] external/envoy/source/server/server.cc:126] flushing stats
Pods in Dev and Prod are absolutely fine.
We have mTLS enabled on our cluster, and have done since day one. As a result we have had the following destinationrule in place for almost 3 months:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
creationTimestamp: 2018-11-05T17:35:35Z
generation: 1
labels:
app: istio-crd
chart: istio-crd-1.0.5
heritage: Tiller
release: istio-crd
name: disable-mtls-to-k8s-dns
namespace: istio-system
resourceVersion: "80583893"
selfLink: /apis/networking.istio.io/v1alpha3/namespaces/istio-system/destinationrules/disable-mtls-to-k8s-dns
uid: 339f9c78-e121-11e8-a96c-42010aa40060
spec:
host: kube-dns.kube-system.svc.cluster.local
trafficPolicy:
tls:
mode: DISABLE
Because I enjoy shooting in the dark, and because I’d seen TCP dns queries failed in a similar fashion months ago before we put this destination rule in, I decided to delete and recreate the DestinationRule, and this fixed the situation:
[atcloud@airflow-web-5d9984d64-2pqpm app]$ dig +tcp www.google.com
; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> +tcp www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19108
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 299 IN A 108.177.96.147
www.google.com. 299 IN A 108.177.96.103
[2019-02-11 08:15:18.731][28][debug][filter] external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2019-02-11 08:15:18.731][28][debug][filter] src/envoy/tcp/mixer/filter.cc:28] Called tcp filter: Filter
[2019-02-11 08:15:18.731][28][debug][filter] src/envoy/tcp/mixer/filter.cc:38] Called tcp filter: initializeReadFilterCallbacks
[2019-02-11 08:15:18.731][28][debug][filter] external/envoy/source/common/tcp_proxy/tcp_proxy.cc:168] [C436] new tcp proxy session
[2019-02-11 08:15:18.731][28][debug][filter] src/envoy/tcp/mixer/filter.cc:98] [C436] Called tcp filter onNewConnection: remote 10.198.13.22:57143, local 10.192.32.10:53
[2019-02-11 08:15:18.731][28][debug][filter] external/envoy/source/common/tcp_proxy/tcp_proxy.cc:305] [C436] Creating connection to cluster outbound|53||kube-dns.kube-system.svc.cluster.local
[2019-02-11 08:15:18.731][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:572] [C437] connecting to 10.198.10.9:53
[2019-02-11 08:15:18.731][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:581] [C437] connection in progress
[2019-02-11 08:15:18.731][28][debug][main] external/envoy/source/server/connection_handler_impl.cc:218] [C436] new connection
[2019-02-11 08:15:18.732][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:466] [C437] connected
[2019-02-11 08:15:18.733][28][debug][filter] src/envoy/tcp/mixer/filter.cc:105] Called tcp filter completeCheck: OK
[2019-02-11 08:15:18.733][28][debug][filter] src/envoy/tcp/mixer/filter.cc:78] [C436] Called tcp filter onRead bytes: 45
[2019-02-11 08:15:18.737][28][debug][filter] src/envoy/tcp/mixer/filter.cc:88] [C436] Called tcp filter onWrite bytes: 141
[2019-02-11 08:15:18.737][28][debug][filter] src/envoy/tcp/mixer/filter.cc:78] [C436] Called tcp filter onRead bytes: 0
[2019-02-11 08:15:18.738][28][debug][filter] src/envoy/tcp/mixer/filter.cc:88] [C436] Called tcp filter onWrite bytes: 0
[2019-02-11 08:15:18.738][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:451] [C437] remote close
[2019-02-11 08:15:18.738][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C437] closing socket: 0
[2019-02-11 08:15:18.738][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:98] [C436] closing data_to_write=0 type=0
[2019-02-11 08:15:18.738][28][debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C436] closing socket: 1
[2019-02-11 08:15:18.738][28][debug][filter] src/envoy/tcp/mixer/filter.cc:132] Called tcp filter onEvent: 1 upstream 10.198.10.9:53
[2019-02-11 08:15:18.738][28][debug][main] external/envoy/source/server/connection_handler_impl.cc:51] [C436] adding to cleanup list
[2019-02-11 08:15:18.738][28][debug][filter] src/envoy/tcp/mixer/filter.cc:33] Called tcp filter : ~Filter
Really worried about this, that a DestinationRule appears to have been randomly ignored? It’s worth noting that all other DestinationRules appear to be being honoured.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 32 (30 by maintainers)
For anyone else experiencing this, adding a
ServiceEntry
and aDestinationRule
if you have mTLS enabled, works around the issue: