kiali: Multi-cluster discovery leads to time out while fetching server configs, blocking access

I installed kiali 1.33.1 and it is not loading on one cluster in nonprod and prod, but it works on every other cluster (although sometimes slow login times). This was same experience with 1.33.0 as well.

When I revert back to 1.32.0 it works.

The browser says:

You are logged in, but there was a problem when fetching some required server configurations. Please, try refreshing the page.

All I see in the logs are

2021-04-26T19:00:50Z INF Not handling OpenId code flow authentication: No nonce code present. Login window timed out.

Kiali server helm values

auth:
  openid:
    client_id: [redacted]
    disable_rbac: true
    issuer_uri: [redacted]
    username_claim: email
  strategy: openid
deployment:
  affinity:
    pod_anti:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - prometheus
          topologyKey: topology.kubernetes.io/zone
        weight: 100
  ingress_enabled: false
  node_selector:
    cloud.google.com/gke-nodepool: monitoring
  pod_anti:
    hack: true
  view_only_mode: true
external_services:
  grafana:
    auth:
      password: [redacted]
      type: basic
      username: admin
    in_cluster_url: http://prometheus-stack-core-grafana:80/
    url: https://core-forbes-development.grafana.forbes.com
  prometheus:
    url: http://core-prometheus:9090/
fullnameOverride: kiali-core
istio_namespace: istio-system
nameOverride: kiali-core

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Hello @rwong2888 Kiali v1.34.1 has been released. It should contain the fix you need.

Please, try it and tell us if it fixes the login issue.

We can’t commit if it will be merged today, but in any case it will be cherry-picked in case that it doesn’t make it for 1.34.0 but it will be available on a 1.34.1 if that’s the case.

I am okay with waiting for v1.3.4 for timeout increase as a workaround. What will the timeout be? Assuming it is just the read timeout? Or is it both read/write?

I think for the release, it’s possible to have a proper fix, rather than just extending the timeout.

I was mentioning a possible workaround in case it was possible to adjust some settings, so that you can use version 1.33 straightaway. But Kiali has these timeouts hard-coded. So, since a release is needed anyway, I think it’s better to do a proper fix rather than a workaround.

I am curious, why is it the 2nd cluster in my mesh can login before the 30s timeout? You were mentioning amount of namespaces?

Kiali is doing a “walk” on each namespace on each “remote” cluster (skipping the local one) to discover other Kiali instances. I guess the cluster where Kiali works OK is the one with the greater number of namespaces and the rest of the clusters have less namespaces. As this Kiali has a smaller number of remote namespaces to walk through, I guess it can finish on time; while the other Kiali instance can’t because if the larger list it needs to “walk”.