cloud-sql-proxy: net/http: timeout awaiting response headers

I am getting a lot of net/http: timeout awaiting response headers from the proxy in my GKE. I’m connecting to regional PSQL 11 instance with private IP address and production maintenance release channel that requires SSL.

I’m getting failures on certificate refresh requests

ephemeral certificate for instance <my-project>:<my-region>:<my-insatnce> will expire soon, refreshing now.
failed to refresh the ephemeral certificate for <my-project>:<my-region>:<my-insatnce> before expering: Post https://www.googleapis.com/sql/v1beta4/projects/<my-project>/instances/<my-insatnce>/createEphemeral?alt=json&prettyPrint=false: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.admin: net/http: timeout awaiting response headers

and connection requests

New connection for "<my-project>:<my-region>:<my-insatnce>"
couldn't connect to "<my-project>:<my-region>:<my-insatnce>": Post https://www.googleapis.com/sql/v1beta4/projects/<my-project>/instances/<my-insatnce>/createEphemeral?alt=json&prettyPrint=false: Get http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.admin: net/http: timeout awaiting response headers

I’ve got a lot of these messages. For example there was ~25 failed connection attempts yesterday between 15:33:15 and 15:38:58.

Deployment’s spec:

spec:
  serviceAccount: sql-proxy
  containers:
    - command:
        - /cloud_sql_proxy
        - -ip_address_types=PRIVATE
        - -instances=<my-project>:<my-region>:<my-insatnce>=tcp:0.0.0.0:5432
      image: gcr.io/cloudsql-docker/gce-proxy:1.16
      imagePullPolicy: IfNotPresent
      name: sqlproxy
      ports:
        - name: postgresql
          protocol: TCP
          containerPort: 5432
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  securityContext:
    runAsNonRoot: true
    runAsUser: 2

I’m not sure if this mattes, but here is the Kubernetes setup: I’ve got several GKE deployments with multiple replicas connecting to single Cloud SQL proxy deployment (also with multiple replicas).

| deployment 1 | ----> |
                       | ClusterIP | ----> | Cloud SQL Proxy | ---->
| deployment N | ----> |

All deployments use the same Cloud SQL Instance, but connect to different PostgreSQL DB (different credentials). GKE cluster has enabled Workload Identity and Cloud SQL Proxy deployment uses service account with roles/cloudsql.client role.

Any ideas why this happens and how can I fix it? Many thanks!

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 2
Comments: 23 (10 by maintainers)

Commits related to this issue

Update cloud.google.com/go from v0.50 to v0.55 #357 — committed to rlweb/cloudsql-proxy by rlweb 4 years ago
Update cloud.google.com/go from v0.50 to v0.55 #357 — committed to rlweb/cloudsql-proxy by rlweb 4 years ago
Update dependencies to latest versions (#364) * Update cloud.google.com/go from v0.50 to v0.55 #357 * Update dependencies to latest. Co-authored-by: kurtisvg <31518063+kurtisvg@users.noreply.gi... — committed to GoogleCloudPlatform/cloud-sql-proxy by rlweb 4 years ago
[defectdojo] Use a fixed image for cloudsql-proxy The fix from https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/357 https://github.com/GoogleCloudPlatform/cloudsql-proxy/commit/fb1939ab92... — committed to broadinstitute/dsp-appsec-infrastructure-apps by dinvlad 4 years ago

Most upvoted comments

Fixed in #364.

kurtisvg on Apr 23, 2020

That error seems quite different than the one mentioned in this issue (HTTP error code 403, not HTTP timeout). Please file a new issue.

On Thu, Apr 30, 2020 at 10:36 PM Denis Loginov notifications@github.com wrote:

I believe we’ve run across a very similar issue in the past 2 days. The only difference is, we don’t get net/http: timeout awaiting response headers message, just the errors like these:

2020/04/28 22:24:04 failed to refresh the ephemeral certificate for <project>:<region>:<instance> before expering: Post https://www.googleapis.com/sql/v1beta4/projects/<project>/instances/<instance>/createEphemeral?alt=json&prettyPrint=false: compute: Received 403 ` Unable to generate token; IAM returned 403 Forbidden: Request had insufficient authentication scopes. This error could be caused by a missing IAM policy binding on the target IAM service account. You can create the necessary policy binding with: …

This doesn’t appear to be related to auth scopes, as we do use Workload Identity and I did verify that the same request using Curl from the affected Pods succeeds.

Do you think that deserves a separate issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/357#issuecomment-622244113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAELF34G4R43MODMI4MSGNLRPJGV3ANCNFSM4K5QL2RQ .

Carrotman42 on May 1, 2020

What library and version is your code using to reach the metadata server endpoint: http://metadata.google.internal/computeMetadata/v1/***

What is the GKE master version? Perhaps you can try updating the version and seeing if that stops the net/http timeouts from occuring. You can always find the available master versions using this command: $ gcloud container get-server-config

I suspect outdated versions may not be able to successfully get tokens from the metadata server

zya-codes on Mar 11, 2020