pgjdbc: Connection hang during initialization / enableSSL

Please read https://stackoverflow.com/help/minimal-reproducible-example

Describe the issue We recently had our application hang due to a new JDBC connection blocking indefinitely during initialization. This issue is somewhat similar to #1822 (SSL handshake stuck forever) but the blocking read occurs at a slightly earlier point in the connection initialization in reading a protocol control code from the server.

Driver Version? 42.2.19 Java Version? java version “1.8.0_101” Java™ SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot™ 64-Bit Server VM (build 25.101-b13, mixed mode) OS Version? Ubuntu 20.04.2 PostgreSQL Version? 12.7 To Reproduce Steps to reproduce the behaviour: The connection hang occurred when we restarted a DB server. Our connection pooling infrastructure attempted to reconnect and the connection hang during the initialization of the connection. We’ve never seen this issue before and we do sometimes restart servers for maintenance so this is likely challenging to reproduce and a rare edge case. Expected behaviour Ideally the code in ConnectionFactoryImpl.enableSSL when reading the response code would not block indefinitely and, in general, any read operations during connection initialization would have a reasonable timeout set. Normally we would want to avoid a read timeout for a JDBC Connection socket for query operations but using timeouts carefully during connection initialization and control operations makes good sense and it appears that PGStream does this for some operations already.

Logs All we have is a stack trace from a thread dump:

"workflow-worker-47" #184 prio=5 os_prio=0 tid=0x00007f6c7001c800 nid=0xd1 runnable [0x00007f6c6e6e6000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113)
	at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
	at org.postgresql.core.PGStream.receiveChar(PGStream.java:443)
	at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:505)
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:149)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)
	at org.postgresql.Driver.makeConnection(Driver.java:465)
	at org.postgresql.Driver.connect(Driver.java:264)
	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138)
	at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:358)
	at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
	at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:477)
	at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:560)
	at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115)
	at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
        ...

Code to reproduce I don’t see an easy way to reproduce this in code since it seemingly depends upon some unusual breaking of the server side of the connection or some unusual server state during a restart. Our servers are AWS RDS Aurora instances too so the exact state of the communications or break in such with the server that allowed the protracted block-on-read to occur may even be peculiar to that environment. This said, the driver should likely be resilient to such situations and not block indefinitely during connection initialization.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

The other issue with this is that I recall doing this sets the socket timeout for the whole connection/session and we have (regrettably) some long-running queries so we want the timeout to be relatively short scoped during the connection phase. Could the PGProperty.CONNECT_TIMEOUT be applied to the SSL setup phase as well?

Yes, I realized the error of this.

I haven’t looked at the code in a while but I THINK the int beresp = pgStream.receiveChar(); line in PGStream.enableSSL is the line where we hung and that might benefit from the timeout when reading the control character if that helps as a starting point.

Ya, that’s where it is. At this point we have no control and we are just waiting for the SSL handshake to return.

Given that today is Friday feel free to bug me on Monday if I haven’t gotten around to this by then.