cloud-sql-proxy: pgx query cancel not working with proxy
Bug Description
When a query is canceled, the cancel request from pgx is sent but not properly received/processed in the Cloud SQL server.
- This happens when I use
pgx
driver with this proxy, not sure about other drivers - Without the proxy (e.g using IP), then the cancel request works normally
Example code (or command)
I wrote a piece of code using pgconn
(the low level api in pgx
), but the same issue happened with pgxpool.Pool
.
Demo code
package main
import (
"context"
"fmt"
"log"
"os"
"os/signal"
"time"
"github.com/jackc/pgconn"
"github.com/pkg/errors"
)
var (
dbuser = "postgres"
dbpassword = "postgres"
dbhost = "localhost"
dbport = "5432"
dbname = "tuananh"
)
func init() {
if os.Getenv("PGUSER") != "" {
dbuser = os.Getenv("PGUSER")
}
if os.Getenv("PGPASSWORD") != "" {
dbpassword = os.Getenv("PGPASSWORD")
}
if os.Getenv("PGHOST") != "" {
dbhost = os.Getenv("PGHOST")
}
if os.Getenv("PGPORT") != "" {
dbport = os.Getenv("PGPORT")
}
if os.Getenv("PGDATABASE") != "" {
dbname = os.Getenv("PGDATABASE")
}
log.Printf("db connection string: postgres://%s:%s@%s:%s/%s?sslmode=disable", dbuser, dbpassword, dbhost, dbport, dbname)
}
func main() {
conn, err := connectDB()
if err != nil {
panic(err)
}
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
defer cancel()
ctx1, cancel1 := context.WithTimeout(ctx, time.Second*10)
defer cancel1()
go func() {
defer cancel1()
query(ctx1, conn)
}()
time.Sleep(time.Second * 2) // sleep for 2s to prevent cancel running before query
ctx2, cancel2 := context.WithTimeout(ctx, time.Second*5)
defer cancel2()
go func() {
defer cancel2()
cancelquery(ctx2, conn)
}()
<-ctx1.Done()
<-ctx2.Done()
log.Printf("all processes finished. Ctrl+C to exit")
<-ctx.Done()
}
func connectDB() (*pgconn.PgConn, error) {
connStr := fmt.Sprintf("postgres://%s:%s@%s:%s/%s?sslmode=disable", dbuser, dbpassword, dbhost, dbport, dbname)
ctx, cancel := context.WithTimeout(context.Background(), time.Second*5)
defer cancel()
conn, err := pgconn.Connect(ctx, connStr)
if err != nil {
return nil, errors.Wrap(err, "pgxpool.ConnectConfig")
}
return conn, nil
}
// query runs some innocuous SELECT statement
func query(ctx context.Context, conn *pgconn.PgConn) {
log.Println("[1] start sleeping")
mrr := conn.Exec(ctx, "SELECT pg_sleep(60);")
for mrr.NextResult() {
_, _ = mrr.ResultReader().Close()
}
if err := mrr.Close(); err != nil { // mrr.Close() gets the first error while reading result
log.Printf("[1] query failed: %s", err)
}
log.Println("[1] query finished")
}
func cancelquery(ctx context.Context, conn *pgconn.PgConn) {
log.Println("[2] canceling query")
err := conn.CancelRequest(ctx)
if err != nil {
log.Printf("[2] failed to cancel request: %s", err)
}
log.Println("[2] done canceling query")
}
Output
$ PGUSER=myIAMUser PGPORT=5432 PGDATABASE=mydb go run requestcancel/main.go
2022/10/31 10:33:40 db connection string: postgres://myIAMUser:postgres@localhost:5432/mydb?sslmode=disable
2022/10/31 10:33:41 [1] start sleeping
2022/10/31 10:33:43 [2] canceling query
2022/10/31 10:33:43 [2] done canceling query
2022/10/31 10:33:51 [1] query failed: timeout: context deadline exceeded
2022/10/31 10:33:51 [1] query finished
2022/10/31 10:33:51 all processes finished. Ctrl+C to exit
^C
# but when checking inside the server, the connection is still running
(psql)
mydb=> select pid, state, query from pg_stat_activity where datname='mydb' and usename='myIAMUser';
pid | state | query
---------+--------+----------------------
1582790 | active | SELECT pg_sleep(60);
(1 row)
How to reproduce
- Run
cloud_sql_proxy
- Run the script above to connect & use the proxy (with your appropriate
PG*
env vars)
Environment
- OS type and version:
Linux anhpngt-manabie 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
- Cloud SQL Proxy version (
./cloud-sql-proxy --version
):Cloud SQL Auth proxy: 1.33.0+linux.amd64
I also had a quick test with cloud-sql-go-connector, but it had the same problem.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 25 (16 by maintainers)
Haven’t forgotten about this. The backend team will be fixing this. I’ll update here when there’s an ETA.
Sorry this slipped off my radar. Yes, the issue has been fixed. Thanks for the reminder.
A fix has been merged and will be rolling out soon. I’ll update here when I have more information.
After doing some investigation, here’s what I found:
I wrote a minimal reproduction (below) that starts a long running query, waits for a SIGTERM, and then cancels the query. This is what I did:
select pid, state, query from pg_stat_activity;
For both v1 and v2:
Minimal Reproduction:
Nice find @danmcgee-soda. I’m going to escalate this to see if we can get a fix shipped sooner.
Wanted to add bit more information that we discovered after investigating this issue today, and stumbling into this bug report.
Using
cloud_sql_proxy -dir=/tmp -instances <instance>
, canceling works fine:But adding the
-enable_iam_login
flag, e.g.cloud_sql_proxy -dir=/tmp -instances <instance> -enable_iam_login
breaks it:I tried cancelling three times this second case, and none were successful at getting through.
Another important bit - I actually used username/password auth in both cases- the determining factor is not actually which type of auth you USE, but only whether automatic IAM authentication is enabled.
Quick update: I’ve shared this with the backend team and they’re investigating.
This would be it. 🤓
When I connect to my Cloud SQL instance using psql and the Proxy with Auto IAM AuthN enabled, and issue a
select pg_cancel_backend(<PID>);
the cancellation works.EDIT: I see that
pg_cancel_backend
uses a wholly separate path thanCancelRequest
, though. So this is likely a case where the Proxy server isn’t honoring the Postgres protocol. I’ll file a bug with the backend team and get them to look at it.Thanks @anhpngt. I’ll do some digging on my end. I’m curious to see if the proxy client -> proxy server connection is somehow not propagating the backend PID properly. I’ll report back here.