kubeflow: [GCP] CLI deployment (kfctl) fails to create cloud endpoint correctly
I performed (multiple times) the deployment using the description given at https://www.kubeflow.org/docs/gke/deploy/deploy-cli/. Everything seems to run fine but when I try to reach the endpoint <kfapp>.endpoints.<project>.cloud.goog I get in the browser DNS_PROBE_FINISHED_NXDOMAIN. Indeed, looking into endpoints in the cloud console or CLI nothing is there.
I checked the pod logs for cloud-endpoints-controller-...
and get the following log lines repeating every second or so:
2019/05/09 09:44:49 [DEBUG][<kfapp>] Changed because parent sig different
2019/05/09 09:44:49 [DEBUG][<kfapp>] Changed because ingress target IP changed
2019/05/09 09:44:50 [INFO][<kfapp>] Service does not yet exist, creating: <kfapp>.endpoints.<project>.cloud.goog
2019/05/09 09:44:51 [ERROR] Could not sync state: [ERROR] Failed to creat Cloud Endpoints service: serviceName: <kfapp>.endpoints.<project>.cloud.goog, err: googleapi: Error 400: Service <kfapp>.endpoints.<project>.cloud.goog has been deleted and will be purged after 30 days. To reuse this service, please undelete the service following https://cloud.google.com/service-management/create-delete., failedPrecondition
I removed kfapp and project names on purpose for potential security reasons; they are given in a correct form; nothing complains at least.
I don’t see any issues before, access denied etc. There are no errors during the kfctl
run as well. I also tried specifying version -v v0.5.0
and -v v0.5.1
, but both give the same result.
Before I tried the web UI deployment and it worker, but I wanted to customize deployment and test different machine pools settings.
Not sure if it’s relevant, but I run it on Windows 10 WSL Ubuntu, thus, at least theoretically from the application perspective, Linux.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (7 by maintainers)
The finding so far, aside the issue quota which is not occuring at the moment, is that if one creates a deployment with one kfapp name, let say
kubeflow
, than first deployment works fine and endpoint gets created, but if we do:Than the second deployment goes through fine, but the endpoint doesn’t get created, hence, it’s quite unusable, at least easily.
Further inspections shows that iap-enabler keeps crashing:
The endpoint-controller logs, despite working, look as follows:
The list of endpoint services is empty, but I tried undeleting the service, with the below command, which returns an error:
After the above operation the endpoint-controller logs show:
And after undeleting and waiting for a while the endpoint is again working.
Hence there should be either a deployment or endpoint-controller command to create or undelete the endpoint, as the creating recently deleted endpoint doesn’t seem to work now.
Issue-Label Bot is automatically applying the label
kind/bug
to this issue, with a confidence of 0.79. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.
The correct fix is for the cloud-endpoints controller to handle the case where the endpoint is being deleted and then undeletes it.
The cloud-endpoints controller we are using i s set here https://github.com/kubeflow/manifests/blob/ffede944f18343271f526bd217cde2edbe6e0e38/gcp/cloud-endpoints/base/deployment.yaml#L13
gcr.io/cloud-solutions-group/cloud-endpoints-controller:0.2.1
Source is here: https://github.com/danisla/cloud-endpoints-controller
I don’t see any logic in the controller to deal with this use case https://github.com/danisla/cloud-endpoints-controller/blob/master/cmd/cloud-endpoints-controller/main.go
So it looks like we still need to fix the controller to work with this.
A longer term solution might be for Kubernetes Cloud Connector https://github.com/GoogleCloudPlatform/k8s-config-connector
To support Cloud Endpoints