linkerd2: GKE Private Clusters Cannot Proxy

Bug Report

GKE Private Clusters cannot proxy requests via kubectl proxy from the looks of it. This breaks linkerd dashboard command and the linkerd check command. I don’t think this is a linkerd issue after looking into it further. I cannot proxy any services on GKE Private Clusters.

What is the issue?

linkerd dashboard linkerd check

both of these fail, because the proxy times out.

How can it be reproduced?

Create a GKE private cluster and install linkerd.

Logs, error output, etc

(If the output is long, please create a gist and paste the link here.)

linkerd check output

NYJKurzMBP:linkerd2 joshua.kurz$ linkerd check --verbose
kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
DEBU[0000] Expecting API to be served over [https://35.237.232.208/api/v1/namespaces/linkerd/services/linkerd-controller-api:http/proxy/api/v1/]
linkerd-api: can initialize the client.....................................[ok]
DEBU[0000] Making gRPC-over-HTTP call to [https://35.237.232.208/api/v1/namespaces/linkerd/services/linkerd-controller-api:http/proxy/api/v1/SelfCheck] []
DEBU[0005] Error invoking [https://35.237.232.208/api/v1/namespaces/linkerd/services/linkerd-controller-api:http/proxy/api/v1/SelfCheck]: Post https://35.237.232.208/api/v1/namespaces/linkerd/services/linkerd-controller-api:http/proxy/api/v1/SelfCheck: context deadline exceeded
linkerd-api: can query the control plane API...............................[FAIL] -- Post https://35.237.232.208/api/v1/namespaces/linkerd/services/linkerd-controller-api:http/proxy/api/v1/SelfCheck: context deadline exceeded

Environment

  • Kubernetes Version:
kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.7-gke.11", GitCommit:"fa90543563c9cfafca69128ce8cd9ecd5941940f", GitTreeState:"clean", BuildDate:"2018-11-08T20:22:21Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Cluster Environment: GKE
  • Host OS:
  • Linkerd version:
NYJKurzMBP:linkerd2 joshua.kurz$ linkerd version
Client version: stable-2.1.0

Possible solution

Thinking we need to dig into what is going on in GKE. It really seems like more of a GKE issue than linkerd, but opening here for visibility and curious what you all think a good solution would be or if you know how to raise awareness for this a little more.

Additional context

@sudermanjr mentioned this issue here as well https://github.com/linkerd/linkerd2/issues/1696

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (12 by maintainers)

Most upvoted comments

I found this looking for the “issue” about GKE private cluster not able to connect to K8s services. Since I find out the solution I will paste it here, just in case is helpful.

When you do kubectl proxy the requests to your services will have a source IP that belongs to the master_ipv4_cidr_block you define, for example, 172.16.3.0/28.

By default the GKE cluster creates a rule like this gke-test-11223344-master Ingress gke-test-11223344--node IP ranges: 172.16.3.0/28 tcp:10250,443 Allow 1000 vpc

You have to add an extra VPC firewall rule to your cluster allowing the ports you want to access from your master CIDR block.

Hey, wanted to add my 2 cents.

GKE in private mode has issues integrating with webhooks from a few projects due to the strict default firewall rules it comes with, see for cert-manager: https://www.revsys.com/tidbits/jetstackcert-manager-gke-private-clusters/

In order to get all of the webhooks working and proxying for Linkerd in my cluster, which includes linkerd and cert-manager, I create a firewall rule that looks like this:

# List compute instances in cluster
gcloud compute instances list

# Get tag for firewall rule
# Look in the "tags" fields
gcloud compute instances describe --format=json <node_from_previous_command>

# Create the firewall rule
gcloud compute firewall-rules create <firewall_name> \
  --source-ranges <master_cidr> \
  --target-tags <target_tag> --network <private_network_name>  \
  --allow TCP:8443,8089,6443 # ports for linkerd webhook + top/tap and cert manager

After setting up this firewall rule, the auto-injection hook works and the top/tap commands work without an issue. Hope this helps!

@Pothulapati Sounds good to me. For reference, you’ll need to change the external API client that the CLI uses to stop using the kubernetes apiserver proxy. We build that client here:

https://github.com/linkerd/linkerd2/blob/28fb72590145d30f61629bc4b1b6481e5637ba5c/controller/api/public/client.go#L220-L234

@joshkurz thank you! This is a great reason to move over to port-forward instead of proxy.