training-operator: helm permission issue on 1.8.1

When I try to install the operator on a 1.8.1 cluster (GKE) like so

helm install https://storage.googleapis.com/tf-on-k8s-dogfood-releases/latest/tf-job-operator-chart-latest.tgz -n tf-job --wait --replace --set cloud=gke

I get the error

Error: release tf-job failed: namespaces "default" is forbidden: User "system:serviceaccount:kube-system:default" cannot get namespaces in the namespace "default": Unknown user "system:serviceaccount:kube-system:default"

This looks like an RBAC issue. Previously I was using K8s 1.7 so I guess something changed with 1.8 which is why I’m hitting this now.

@sozercan Any idea what’s going on? Is the problem that helm needs to be granted appropriate permissions as mentioned here

helm version
Client: &version.Version{SemVer:"v2.4.2", GitCommit:"82d8e9498d96535cc6787a6a9194a76161d29b4c", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.0", GitCommit:"08c1144f5eb3e3b636d9775617287cc26e53dba4", GitTreeState:"clean"}

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 18 (6 by maintainers)

Most upvoted comments

@foxish what’s the proper way to setup helm on a GKE cluster running 1.8? Should it just work or is it expected that I have to run commands like the following (from this post)

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'      
helm init --service-account tiller --upgrade

Just these commands, it’ll work

kubectl create serviceaccount --namespace kube-system tiller kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller kubectl patch deploy --namespace kube-system tiller-deploy -p ‘{“spec”:{“template”:{“spec”:{“serviceAccount”:“tiller”}}}}’
helm init --service-account tiller --upgrade

Tiller that is bundled with Azure includes service account and role bindings (as cluster-admin). I am guessing this doesn’t come with GKE?

Tfjob CRD sets up it’s own serviceaccount and role bindings, so that shouldn’t be an issue. Sounds like this is permissions for the tiller itself. Maybe we can update the docs to include something like this in case it doesn’t exist

kubectl create clusterrolebinding tiller-cluster-admin \
    --clusterrole=cluster-admin \
    --serviceaccount=kube-system:default