pgjdbc: Unexpected packet type during stream replication

I’m submitting a …

  • bug report
  • feature request

Describe the issue When using PGReplicationStream to get data off of a replication slot, we occasionally encounter an Unexpected packet type while processing copy data. The control character that we are actually receiving is a lowercase ‘w’ which, when looking through other parts of the code, appears to be the control character that we’d receive when PostgreSQL is telling pgjdbc to expect xlog data.

99% of the time, this is fine since the replication is resilient and just kinda picks up where it left off after the exception is raised. We have, however, encountered times when, no matter what we did, reading from that stream always failed at some point during a transaction and the replication lag got rather out of hand. At first, we thought this had to do with the size of the transactions that were failing but it does not seem to make a difference how big these transactions are.

Driver Version? RE42.2.5

Java Version? openjdk version “1.8.0_212”

OS Version? debian 9.8

PostgreSQL Version? We’ve seen this on PostgreSQL 9.6.6 and PostgresSQL 11 (Latest)

To Reproduce We prepared a toy app and docker compose file that should, at some point, cause the error.

https://github.com/omgrr/streamProblems

We usually see the error within 15 minutes, but occasionally it has taken an hour. There is an example log file in the repo to see what this error typically looks like.

Expected behaviour This seems to be happening when pgjdbc requests a status update. We would expect to see the correct response from postgres, but instead the next byte on the buffer indicates more xlog data.

Logs

The full compressed logs can be found on in the in example repo, in the example_packet_error.log.gz log file.

pgjdbc logs around the exception -

Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                           
FINEST:  <=BE CopyData                                                                                                        
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.replication.V3PGReplicationStream processXLogData                              
FINEST:   <=BE XLogData(currWal: 0/1DE4D18, lastServerWal: 0/1DE4D18, clock: 0)                                               
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                           
FINEST:  <=BE CopyData                                                                                                        
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.replication.V3PGReplicationStream processXLogData                              
FINEST:   <=BE XLogData(currWal: 0/1DE4D18, lastServerWal: 0/1DE4D18, clock: 0)                                               
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                           
FINEST:  <=BE CopyData                                                                                                        
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.replication.V3PGReplicationStream processXLogData                              
FINEST:   <=BE XLogData(currWal: 0/1DE4D18, lastServerWal: 0/1DE4D18, clock: 0)                                               
Apr 09, 2019 7:18Exception in thread "main" org.postgresql.util.PSQLException: Database connection failed when writing to copy
 at org.postgresql.core.v3.QueryExecutorImpl.writeToCopy(QueryExecutorImpl.java:1000)                                         
 at org.postgresql.core.v3.CopyDualImpl.writeToCopy(CopyDualImpl.java:19)                                                     
 at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:189)             
 at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:181)                 
 at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:121)                     
 at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:78)                       
 at streamProblem.main(streamProblem.java:43)                                                                                 
Caused by: java.io.IOException: Unexpected packet type during copy: 119                                                       
 at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1236)                                  
 at org.postgresql.core.v3.QueryExecutorImpl.writeToCopy(QueryExecutorImpl.java:998)                                          
 ... 6 more                                                                                                                   
:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                                            
FINEST:  <=BE CopyData                                                                                                        
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.replication.V3PGReplicationStream processXLogData                              
FINEST:   <=BE XLogData(currWal: 0/1DE4D18, lastServerWal: 0/1DE4D18, clock: 0)                                               
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                           
FINEST:  <=BE CopyData                                                                                                        
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.replication.V3PGReplicationStream processXLogData                              
FINEST:   <=BE XLogData(currWal: 0/1DE4D18, lastServerWal: 0/1DE4D18, clock: 0)                                               
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                           
FINEST:  <=BE CopyData                                                                                                        
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.replication.V3PGReplicationStream processXLogData                              
FINEST:   <=BE XLogData(currWal: 0/1DE6968, lastServerWal: 0/1DE6968, clock: 0)                                               
Apr 09, 2019 7:18:18 PM org.postgresql.core.v3.QueryExecutorImpl processCopyResults                                           
FINEST:  <=BE CopyData                                                                                                        

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 67 (44 by maintainers)

Most upvoted comments

Fixed in #1467 Thanks everyone for finding and testing this!

OK, I’ve been looking at this for the last couple days. It is very repeatable.

Many times the stream pgstream appears to be reset just before the problem as the index is very close to 0

see https://github.com/pgjdbc/pgjdbc/pull/1470. There is at least one bug fixed but apparently that wasn’t it.

What appears to happen is that we miss reading the ‘d’ and get an invalid character. So either the code is reading too far in the previous message or we are mis-handling the buffer…

I’m off for a few days so won’t be back to this probably until after Easter

@omgrr any chance you guys can test https://github.com/pgjdbc/pgjdbc/pull/1603 ? I was never happy with the hack I did to fix this. SocketTimeout looks like a more reasonable solution.

thanks @davecramer ! I’ll give it a try if encounter this issue again.