cloud-sql-proxy: x509: certificate signed by unknown authority

Last night around 11:02 pm, our batch workers on GAE reported losing connection to MySQL multiple times in the middle of running some queries.

I checked our production MySQL instance for anything out of the ordinary but its CPU and memory usage was relatively low, as were the amount of connections. I was even able to use our web app which uses that same database when these errors started appearing, confirming that in fact our MySQL instance was up, running, and responding to requests. In other words, our MySQL instance was not under stress and everything was functioning correctly. The issue went away on its own around 11:20 pm with no action required on my part.

Looking at the logs, before every 'Lost connection to MySQL server during query' entry, there was this error:

2019-08-07 23:05:05.000 EDT
2019/08/08 03:05:05 couldn't connect to "our-production-env:us-somewhere:our-mysql-db": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "Google Cloud SQL Server CA")

These errors showed up a few times on our two GAE batch applications, immediately followed by “Lost Connection to MySQL”. This hasn’t happened since.

Any insight as to what may be responsible for this? I’m worried it will happen again and I have no real lead.

Thanks in advance!

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (8 by maintainers)

Most upvoted comments

For anyone that’s reading this having the same issue and attempts to use it in Docker, check several things:

  1. Make sure your Dockerfile is using fairly recent base image, update it if it is not
  2. Install ca-certificates package
  3. Run update-ca-certificates within a runner image of your dockerfile.

@curtbushko I use the proxy as well, and under “Connections” in the cloudsql console ui, i clicked this button and it started working for me.

image

Random theory: I think this might happen if someone reset the SLL configuration of the instance: that would cause the cached version of the server cert to be incorrect for some period of time. There is indeed throttling to make sure that there’s no thundering herd when this happens (and this throttling is evident in the ealier comment https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/297#issuecomment-523207974), but it should only last for 1 minute by default (https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/18df49e/proxy/proxy/client.go#L31).

Sorry about that - you can find all the Product Issue Trackers here. The Cloud SQL tracker should be here.