pachyderm: Can't deploy Pach to GKE default k8s version
Pachyderm won’t deploy to the latest default version of k8s in GKE. This was reported by a user, and I reproduced the issue. The default version in GKE is 1.8.8-gke.0. For this version or greater, the pachd pods errors and goes into CrashLoopBackoff with the following serviceaccount related errors:
time="2018-03-16T20:16:44Z" level=error msg="unable to access kubernetes nodeslist, Pachyderm will continue to work but it will not be possible to use COEFFICIENT parallelism. error: nodes is forbidden: User "system:serviceaccount:default:pachyderm" cannot list nodes at the cluster scope: Unknown user "system:serviceaccount:default:pachyderm""
time="2018-03-16T20:16:44Z" level=error msg="unable to access kubernetes pods, Pachyderm will continue to work but certain pipeline errors will result in pipelines being stuck indefinitely in "starting" state. error: unknown (get pods)"
time="2018-03-16T20:16:44Z" level=error msg="unable to access kubernetes pods, Pachyderm will continue to work but get-logs will not work. error: pods is forbidden: User "system:serviceaccount:default:pachyderm" cannot list pods in the namespace "default": Unknown user "system:serviceaccount:default:pachyderm""
time="2018-03-16T20:16:44Z" level=error msg="unable to create kubernetes replication controllers, Pachyderm will not function properly until this is fixed. error: replicationcontrollers is forbidden: User "system:serviceaccount:default:pachyderm" cannot create replicationcontrollers in the namespace "default": Unknown user "system:serviceaccount:default:pachyderm""
time="2018-03-16T20:16:44Z" level=error msg="unable to delete kubernetes replication controllers, Pachyderm function properly but pipeline cleanup will not work. error: replicationcontrollers "ceb8a1da36ad4700811aa32da3ea8c29" is forbidden: User "system:serviceaccount:default:pachyderm" cannot delete replicationcontrollers in the namespace "default": Unknown user "system:serviceaccount:default:pachyderm""
2018-03-16T20:16:44Z INFO authclient.API.GetCapability {"request":{}}
2018-03-16T20:16:44Z INFO authclient.API.GetCapability {"duration":0.001143887,"request":{},"response":{"capability":"5273272262ac4b06a76752cce2582e35"}}
endpoints "pachd" is forbidden: User "system:serviceaccount:default:pachyderm" cannot get endpoints in the namespace "default": Unknown user "system:serviceaccount:default:pachyderm"
However, if you use --cluster-version 1.7.14-gke.1 or earlier, everything seems to be ok.
To reproduce following the GCP docs for deployment with Pach version 1.7.0rc2.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 3
- Comments: 18 (10 by maintainers)
This is the workaround that worked for me with the current default version
1.8.8-gke.0in GCP and Pachyderm version1.7.0. I’ve also installed Pachyderm on a separate namespace. The installation uses the default RBAC setup as per the official documentation[1].[1] http://pachyderm.readthedocs.io/en/latest/deployment/google_cloud_platform.html
The workaround is as follows, after running the Pachyderm deployment steps:
The key thing is that the user is set to
system:serviceaccount:default:pachydermand given the cluster-admin role. Is this something that can be set for the serviceaccount settings somewhere?We should update the docs with the steps need to ensure the gcp service account has the appropriate permissions to be deployed properly.