kiali: Multi-cluster discovery leads to time out while fetching server configs, blocking access

I installed kiali 1.33.1 and it is not loading on one cluster in nonprod and prod, but it works on every other cluster (although sometimes slow login times). This was same experience with 1.33.0 as well.

When I revert back to 1.32.0 it works.

The browser says:

You are logged in, but there was a problem when fetching some required server configurations. Please, try refreshing the page.

All I see in the logs are

2021-04-26T19:00:50Z INF Not handling OpenId code flow authentication: No nonce code present. Login window timed out.

Kiali server helm values

auth:
  openid:
    client_id: [redacted]
    disable_rbac: true
    issuer_uri: [redacted]
    username_claim: email
  strategy: openid
deployment:
  affinity:
    pod_anti:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - prometheus
          topologyKey: topology.kubernetes.io/zone
        weight: 100
  ingress_enabled: false
  node_selector:
    cloud.google.com/gke-nodepool: monitoring
  pod_anti:
    hack: true
  view_only_mode: true
external_services:
  grafana:
    auth:
      password: [redacted]
      type: basic
      username: admin
    in_cluster_url: http://prometheus-stack-core-grafana:80/
    url: https://core-forbes-development.grafana.forbes.com
  prometheus:
    url: http://core-prometheus:9090/
fullnameOverride: kiali-core
istio_namespace: istio-system
nameOverride: kiali-core

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 30 (13 by maintainers)

Commits related to this issue

Reduce number of API calls to discover remote Kialis A "walk" through all remote cluster namespaces was being done to discover remote Kiali instances (that is, one query per namespace, per cluster). ... — committed to israel-hdez/swscore by israel-hdez 3 years ago
Reduce number of API calls to discover remote Kialis (#3970) * Reduce number of API calls to discover remote Kialis A "walk" through all remote cluster namespaces was being done to discover remote... — committed to kiali/kiali by israel-hdez 3 years ago
Reduce number of API calls to discover remote Kialis (#3970) * Reduce number of API calls to discover remote Kialis A "walk" through all remote cluster namespaces was being done to discover remote... — committed to israel-hdez/swscore by israel-hdez 3 years ago

Most upvoted comments

Hello @rwong2888 Kiali v1.34.1 has been released. It should contain the fix you need.

Please, try it and tell us if it fixes the login issue.

israel-hdez on May 14, 2021

We can’t commit if it will be merged today, but in any case it will be cherry-picked in case that it doesn’t make it for 1.34.0 but it will be available on a 1.34.1 if that’s the case.

lucasponce on May 7, 2021

I am okay with waiting for v1.3.4 for timeout increase as a workaround. What will the timeout be? Assuming it is just the read timeout? Or is it both read/write?

I think for the release, it’s possible to have a proper fix, rather than just extending the timeout.

I was mentioning a possible workaround in case it was possible to adjust some settings, so that you can use version 1.33 straightaway. But Kiali has these timeouts hard-coded. So, since a release is needed anyway, I think it’s better to do a proper fix rather than a workaround.

I am curious, why is it the 2nd cluster in my mesh can login before the 30s timeout? You were mentioning amount of namespaces?

Kiali is doing a “walk” on each namespace on each “remote” cluster (skipping the local one) to discover other Kiali instances. I guess the cluster where Kiali works OK is the one with the greater number of namespaces and the rest of the clusters have less namespaces. As this Kiali has a smaller number of remote namespaces to walk through, I guess it can finish on time; while the other Kiali instance can’t because if the larger list it needs to “walk”.

israel-hdez on Apr 27, 2021