istio: listener conflicts related to headless services (warnings, errors, etc.)
Hi, I’m seeing the following error in istio-pilot/discovery:
2018-06-03T00:27:50.659513Z warn buildSidecarOutboundListeners: listener conflict (TCP current and new TCP) on 0.0.0.0:9094, destination:outbound|9094||prometheus-alertmanager.istio-system.svc.cluster.local, current Listener: (0.0.0.0_9094 name:"0.0.0.0_9094" address:<socket_address:<address:"0.0.0.0" port_value:9094 > > filter_chains:<filters:<name:"envoy.tcp_proxy" config:<fields:<key:"deprecated_v1" value:<bool_value:true > > fields:<key:"value" value:<struct_value:<fields:<key:"route_config" value:<struct_value:<fields:<key:"routes" value:<list_value:<values:<struct_value:<fields:<key:"cluster" value:<string_value:"outbound|9094||alertmanager.data-platform.svc.cluster.local" > > > > > > > > > > fields:<key:"stat_prefix" value:<string_value:"outbound|tcp|9094" > > > > > > > > deprecated_v1:<bind_to_port:<> > )
For context, we have two teams running alertmanager, in separate namespaces.
- alertmanager.data-platform.svc.cluster.local
- prometheus-alertmanager.istio-system.svc.cluster.local
It appears when I use the same port section in the service for each of these, eg:
spec:
ports:
- name: cluster
port: 9094
protocol: TCP
targetPort: cluster
If I rename the port in one of the second services to something else, everything starts working again and the errors in istio pilot go away.
EG:
spec:
ports:
- name: cluster
port: 9094
protocol: TCP
targetPort: cluster
Adding a prefix resolves it:
spec:
ports:
- name: http-cluster
port: 9094
protocol: TCP
targetPort: cluster
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 36 (35 by maintainers)
Commits related to this issue
- Remove listeners for newer services when conflicts occur. This PR adds the CreationTime to the service struct and populates it in the k8s service registry. When available, services will be sorted by ... — committed to nmittler/istio by nmittler 6 years ago
- Remove listeners for newer services when conflicts occur. This PR adds the CreationTime to the service struct and populates it in the k8s service registry. When available, services will be sorted by ... — committed to nmittler/istio by nmittler 6 years ago
- Add listeners by creation time and cleanup error reports for conflicts (#7043) * Remove listeners for newer services when conflicts occur. This PR adds the CreationTime to the service struct and p... — committed to istio/istio by deleted user 6 years ago
Hello, I’m really sorry but I’m not OK at all with this thread. We just had an outage caused by the following sequence of events:
One team is running solr, in their own namespace, this this service:
Another team, in another namespace, completely unrelated, deployed a headless service on the same port as the http solr service:
This results in:
And it totally broke all communication to
http://solr.search-solr
, and solr started receiving traffic from airflow!Removing
airflow-worker
restored communication tosearch-solr
.The fact that people working in isolated namespaces can effectively cause service impacting issues in totally unrelated other namespaces due to their port configuration is not acceptable. We basically cannot use Istio on a multi-tenant cluster where people are empowered to write their own yaml files, if we do not get a resolution to this problem.
@nmittler @sakshigoel12
Looking closely at the problem, this conflict arises due to multiple protocols on same port with a wildcard listener. Even if we setup listeners on every service IP, the same problem will occur when there are two service entries with HTTP and TCP, pointing to two different services. Put another way, the whole of Istio is built on the assumption of client-side load balancing, service discovery, etc. Headless services and stateful sets break that pattern by directly accessing a service instance, instead of accessing through a virtual IP.
There are a few ways to solve this IMO:
Either of these options requires a CI/CD or some form of admission control system that can detect port conflicts and prevent the services from being deployed in Kubernetes (and give a helpful error message to the end user). @ayj is it possible to write such an admission control plugin for services? Or is it restricted to just CRDs ?
@rshriram has it crossed your mind, even for one second that consumers of Istio, that don’t always understand the internals of the code base, report issues based on their perception of the world and then rely on “the experts” to help them identify the root cause.
This issue details my experiences, and the logs I see. And as I said early on in the issue, I did see a service outage then too but couldn’t quite identify what caused it.
In both circumstances, I see the same warnings in pilot, so I am obviously going to continue to talk on the same thread. From a consumers perspective, they are related in the sense that in both situations, people expect to be able to deploy into a namespace with isolation and Istio breaks that mould and that is very dangerous.
If you would like to break it up, please go ahead. But stop being difficult and pedantic with someone who is just trying to help you find issues in a product before you fly your flagship “1.0 release”.
Can you remove the
ClusterIP: None
from your spec? It seems that when reading from Kubernetes, it is giving Pilot a cluster IP of 0.0.0.0 for both services. This results in a collision (two listeners on the same IP:port (0.0.0.0:9999).