kubeflow: Istio ingress fails to reach ready state on EKS, statuscode 503
/kind bug
Upon deploying Kubeflow 0.7.0 on an EKS Cluster with Kubernetes 1.14 (following this guide: https://www.kubeflow.org/docs/aws/deploy/install-kubeflow/) all services start up correctly and kubectl get ingress -n istio-system
returns an address after a few minutes. However, the address is not accessible. kubectl -n istio-system describe pod/istio-ingressgateway-xxxxxxxxxx-yyyy
then yields:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned istio-system/istio-ingressgateway-565b894b5f-b56pz to ip-192-168-24-84.ec2.internal
Warning FailedMount 2m kubelet, ip-192-168-24-84.ec2.internal MountVolume.SetUp failed for volume "istio-ingressgateway-service-account-token-tzgwg" : couldn't propagate object cache: timed out waiting for the condition
Warning FailedMount 2m kubelet, ip-192-168-24-84.ec2.internal MountVolume.SetUp failed for volume "ingressgateway-ca-certs" : couldn't propagate object cache: timed out waiting for the condition
Warning FailedMount 2m kubelet, ip-192-168-24-84.ec2.internal MountVolume.SetUp failed for volume "ingressgateway-certs" : couldn't propagate object cache: timed out waiting for the condition
Normal Pulled 2m kubelet, ip-192-168-24-84.ec2.internal Container image "docker.io/istio/proxyv2:1.1.6" already present on machine
Normal Created 2m kubelet, ip-192-168-24-84.ec2.internal Created container istio-proxy
Normal Started 2m kubelet, ip-192-168-24-84.ec2.internal Started container istio-proxy
Warning Unhealthy 1m (x19 over 2m) kubelet, ip-192-168-24-84.ec2.internal Readiness probe failed: HTTP probe failed with statuscode: 503
I am using kfctl_aws.0.7.0.yaml
, not the cognito version.
Environment:
- Kubeflow version: (version number can be found at the bottom left corner of the Kubeflow dashboard): 0.7.0
- kfctl version: (use
kfctl version
): 0.7.0 - Kubernetes platform: EKS
- Kubernetes version: (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-27T14:42:18Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.8-eks-b7174d", GitCommit:"b7174db5ee0e30c94a0b9899c20ac980c0850fc8", GitTreeState:"clean", BuildDate:"2019-10-18T17:56:01Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 20 (4 by maintainers)
I undestand the reasoning that this should probably be addressed by the Istio team, but on the other hand Kubeflow is an ecosystem consisting of a great many different such projects.
From my perspective Kubeflow is interesting to the community of Machine Learning developers exactly because it promises to abstract away a lot of the underlying complexity. That promise gets watered down quite a lot when the responsibility for ensuring that the ecosystem works simply gets delegated to the projects Kubeflow is built on!
Also, quite possibly the problem is already solved with a newer version (1.3.5) of Istio (which @chanwit’s comment above seems to suggest), but the default Kubeflow distribution is currently locked to 1.1.6 Based on my tests, it isn’t addressed by 1.3.1 though (which was the latest version being discussed as becoming the new standard version). It is probably quite important for the Kubeflow team to know this.
My way of installing Istio 1.3.5 is not the standard way used by
kfctl
Basically, I removed all Istio deployment, svc and manually did it via Helm.don’t mind me. I upgraded Istio to 1.3.5 via Helm and it’s working fine now.
@chanwit @karlschriek In this case, could you guys please share the istio-system pods status? It’s possible that Ingressgateway is blocked by Pilot, which is blocked by galley which is blocked by citadel, etc. Let’s make sure your istio control plane is ready and healthy.