eventing-kafka-broker: Simple Broker does not become Ready, 502

Hi all, we have a wierd issue with a simple broker not becoming ready:

kn broker list
NAME                             URL   AGE   CONDITIONS   READY   REASON
onboarding-ci-kn-kafka-cluster         13m   4 OK / 7     False   ProbeStatus : status: NotReady

Looking at the logs from kafka-controller we see that the Probe fails due to bad gateway 502:

kafka-controller-7bc844bb6b-x4frd controller {"level":"debug","ts":"2022-04-20T11:19:23.200Z","logger":"kafka-broker-controller","caller":"broker/broker.go:223","msg":"Updated dispatcher pod annotation","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","knative.dev/controller":"knative.dev.eventing-kafka-broker.control-plane.pkg.reconciler.broker.Reconciler","knative.dev/kind":"eventing.knative.dev.Broker","knative.dev/traceid":"eae1957d-361e-4364-a22b-afa2af7241a2","knative.dev/key":"knative-eventing/onboarding-ci-kn-kafka-cluster","action":"reconcile","uuid":"88c7538e-5722-4d83-88ea-bdf01191af7d"}
kafka-controller-7bc844bb6b-x4frd controller {"level":"info","ts":"2022-04-20T11:19:23.200Z","logger":"kafka-broker-controller","caller":"controller/controller.go:543","msg":"Reconcile succeeded","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","knative.dev/controller":"knative.dev.eventing-kafka-broker.control-plane.pkg.reconciler.broker.Reconciler","knative.dev/kind":"eventing.knative.dev.Broker","knative.dev/traceid":"eae1957d-361e-4364-a22b-afa2af7241a2","knative.dev/key":"knative-eventing/onboarding-ci-kn-kafka-cluster","duration":0.337743497}
kafka-controller-7bc844bb6b-x4frd controller {"level":"debug","ts":"2022-04-20T11:19:23.200Z","logger":"kafka-broker-controller","caller":"prober/prober.go:63","msg":"Sending probe request","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","scope":"prober","pod.metadata.name":"kafka-broker-receiver-7549f88579-jcmk6","address":"http://100.101.218.144:8080/knative-eventing/onboarding-ci-kn-kafka-cluster"}
kafka-controller-7bc844bb6b-x4frd controller {"level":"info","ts":"2022-04-20T11:19:23.201Z","logger":"kafka-broker-controller","caller":"prober/prober.go:86","msg":"Resource not ready","knative.dev/pod":"kafka-controller-7bc844bb6b-x4frd","scope":"prober","pod.metadata.name":"kafka-broker-receiver-7549f88579-jcmk6","address":"http://100.101.218.144:8080/knative-eventing/onboarding-ci-kn-kafka-cluster","statusCode":502}

So the IP is for kafka-broker-receiver pod. To be honest we have no clue what might be wrong here. Also seems wierd to probe the pod directly instead of a service

Expected behavior The broker to become ready

To Reproduce Steps to reproduce the behavior.

Knative release version 1.3.0 Additional context Add any other context about the problem here such as proposed priority

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

I think we should go ahead and test this. Moreover, it makes sense for us to have knative-eventing within istio mesh, as the issues will kind of propagate to knative serving, assuming these kservices are sinks etc. and all our knative-serving resources are within istio-mesh. wdyt @sel-vcc @markhulia ?

Also, thanks for the quick fix @pierDipi . We will report back to you to verify that things work as expected

OK, so I guess it will work since service is exposed via Kubernetes DNS, but I guess it’s advised to use VirtualService for kafka-broker-ingress.

Yes, it should work ok with a k8s Service. As I understand it knative-eventing no-longer has a dependency on Istio, so there is no VirtualService for kafka-broker-ingress.

Hi @matzew ! Sorry for the late reply. The patch worked for us, thanks!

Here’s a gist with patched artifacts https://gist.github.com/pierDipi/b584b0a9167dfeeffd0f934847c1dffa (you have to scroll a bit to find all the files you might need, probably it’s only eventing-kafka-controller.yaml, eventing-kafka-broker.yaml)

I’ve created a patch in #2112, after CI jobs run and they are green, is anyone willing to test the patch with Istio and your setup (I will give you custom manifests unless you want to build the project from source code)?

Thanks @pierDipi, we can definitely test the patch.

I had a quick look through the PR and if I have understood correctly the probing is still based on the Pod IP addresses? We can certainly test the fix, but I’m fairly sure that we cannot connect to those IPs from within Istio. The reason behind this is that the Envoy config provided by Istio is based on the k8s service DNS address, which Envoy can resolve to an IP and match against an incoming request’s authority. However, Envoy does not know about the relationship between the k8s Service and the set of Pods that back it, so Envoy has no upstream config for those Pod IPs and returns a 502 response.

I’ve created a patch in https://github.com/knative-sandbox/eventing-kafka-broker/pull/2112, after CI jobs run and they are green, is anyone willing to test the patch with Istio and your setup (I will give you custom manifests unless you want to build the project from source code)?

Hi @pierDipi, In this case both kafka-controller and kafka-broker-receiver pods were part of the istio mesh (injected with istio-proxy sidecar containers). The issue is that it is not possible to connect to a Pod IP address because there is no VirtualService to define the route.

I have confirmed this behaviour with istio’s sleep and httpbin samples deployed to a namespace with istio-injection enabled:-

  • Curl the httpbin service domain name ✅
$ kubectl -n istio-test exec -it svc/sleep -c sleep -- curl -sS -D /dev/stderr -o /dev/null http://httpbin:8000/status/200
HTTP/1.1 200 OK
server: envoy
date: Wed, 20 Apr 2022 12:27:28 GMT
content-type: text/html; charset=utf-8
access-control-allow-origin: *
access-control-allow-credentials: true
content-length: 0
x-envoy-upstream-service-time: 36
  • Curl a httpbin Pod by its IP address ❌
$ kubectl -n istio-test exec -it svc/sleep -c sleep -- curl -sS -D /dev/stderr -o /dev/null http://100.106.34.197:8000/status/200
HTTP/1.1 502 Bad Gateway
date: Wed, 20 Apr 2022 12:27:09 GMT
server: envoy
content-length: 0