mongo-php-driver: Intermittent error: MongoConnectionException:No suitable servers found (`serverselectiontryonce` set): [Server closed connection. calling ismaster on 'x.x.x.x:27017']'

Description

Intermittent errors observed with mongo-php-driver.

First tried 1.2.5, but have reverted down to 1.1.9.

1.2.5 had more frequent errors (almost multiple per minute) whereas 1.1.9 is a few per hour or so.

Adjusted the following things:

  1. Converted all hostnames to ip addresses in connection strings (https://github.com/mongodb/mongo-php-driver/issues/133)
  2. Set default connectTimeoutMS and socketTimeoutMS to 20000ms (https://github.com/mongodb/mongo-php-driver/issues/374)
  3. Set default ‘w’ to 1, and wTimeoutMS to 20000ms (https://groups.google.com/forum/#!msg/mongodb-dev/zlzvXo6pOcI/1y5KcEZ-IwAJ)
  4. Backed out to 1.1.9 (https://github.com/mongodb/mongo-php-driver/issues/425)

Environment

PHP 7.0.15 on CentOS 7.3.1611 x86_64

$ php -i | grep -E 'mongodb|libmongoc|libbson'

mongodb
mongodb support => enabled
mongodb version => 1.1.9
mongodb stability => stable
libmongoc version => 1.3.6
libbson version => 1.3.6
mongodb.debug => no value => no value

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 31 (12 by maintainers)

Most upvoted comments

Reading through older posts in this thread, I imagine this might have been due to CDRIVER-2075 being included in PHP driver 1.3.0, which was released in September 2017 (after the last activity on this thread). That added logic in libmongoc to retry failed isMaster commands exactly once, which allows for recovering from a dropped connection despite the MongoDB server still being available.

Either way, thanks for confirming that the previous error is resolved on your end.

It looks like this issue is resolved with latest versions of:

  • Mongo DB 3.6.3
  • MongoDB PHP Driver 1.4.2
  • MongoDB PHP Library 1.3.0
  • PHP 7.2.3
  • Alpine 3.7

Read preference describes how MongoDB clients route read operations to the members of a replica set. if i define no replica set - why is this even tried?

@narcoticfresh: If you look through the MongoDB PHP library source, you’ll see that it frequently calls MongoDB\Driver\Manager::selectServer() to select a single server before executing an operation. If you have not configured a custom read preference, we use “primary” by default (which works just fine for a standalone topology). The role of read preferences in server selection is explained thoroughly in our Server Selection specification.

That said, the library’s server selection is largely an implementation detail. We need a MongoDB\Driver\Server object to extract version information to construct our commands according to server-side API differences (see MongoDB\Operation\ListIndexes::execute() for one example). We then want to ensure that the command is executed on the very same server whose information we considered, so we rely on the MongoDB\Driver\Server::executeBulkWrite(), executeCommand(), and executeQuery() methods instead of those found on MongoDB\Driver\Manager.

If you were using the driver directly and called executeBulkWrite(), executeCommand(), or executeQuery() on the MongoDB\Driver\Manager object, server selection would still happen internally and potentially throw the same “No suitable servers found” exception.

is there any other possible reason for this error to happen except connectivity?

To clarify, “No suitable servers found” is the general exception that server selection failed. This could happen for any number of reasons: selecting a secondary when none exist; applying read preference tags that do not match any servers; loss of connectivity to previously available servers. When reporting a server selection error, libmongoc aggregates any internal errors in its message, which we use for our exception message. The relevant error in your case is what follows “No suitable servers found”: “Server closed connection. calling ismaster on mongodb.myhost.internal:27017.” That hints towards a connectivity issue and may be the result of HAProxy terminating connections.

any tweak possible to deactivate this behavior?

As I mentioned above in my replies to @sandrokeil, serverSelectionTryOnce=false may help by allowing the driver to make repeated attempts to find a server up to a configurable timeout (default is 30 seconds). This comes at the expense of potentially delaying your application’s response times if no server is available, which is why it is disabled by default. The current fail-fast behavior is also similar to how the legacy driver performed.

I elaborate on serverSelectionTryOnce=false a bit more in https://github.com/mongodb/mongo-php-driver/issues/564#issuecomment-288427486 if you’d like to read that for additional context.

I’ve updated the connection params of the test repository with serverSelectionTryOnce=false. As you described, the not master error is gone, but I’m getting now "Failed to send "insert" command with database "testing": socket error or timeout" and "Failed to send "find" command with database "testing": socket error or timeout". Even after some minutes after the test (reload web page manually) or try the updated test suite. And the not master error comes back. 🤔

This agrees with what I shared in the middle of my second-to-last post. Adding serverSelectionTryOnce=false to the URI only suppresses the “No suitable servers found” errors, as it allows to driver up to 30 seconds to select a server. If no suitable servers are found during that selection window, the driver will more actively monitor the known servers according to minHeartbeatFrequencyMS, which is 500ms. I realize the spec doesn’t refer to single-threaded implementations in that section, although libmongoc does follow the same logic for PHP. I’ve opened SPEC-866 to request that the spec be updated to note the applications to single-threaded implementations.

The “insert” and “find” failures due to “socket error or timeout” were expected, as the driver is realizing that the connection was closed at the time it’s trying to use it for an operation. It is the application’s responsibility to catch that exception and decide if it should retry it. The “not master” exception was also expected since the node reporting that error was last known to be a primary when server selection took place. The driver does not realize that it’s ineligible to handle a write until that error comes across.

As mentioned in the opening paragraphs of my previous post, reducing heartbeatFrequencyMS is really the only way to reduce the frequency of these errors (at the cost of increased monitoring overhead). Earlier, I was able to guarantee error-free operations by using a 10-second monitoring interval and trusting that queries were issued no sooner than 15 seconds (which I reduced from 60 seconds) after the last topology change (i.e. shutting down and restarting a node). That assumption is obviously is not practical for a live application when requests can come at any time, so even a reduced monitoring interval could have windows of time where the driver is prone to unexpected topology changes and throw an exception.

And this is the corresponding cluster status (docker-compose exec mongodb0 sh status.sh). Is this info correct? (“infoMessage” : “could not find member to sync from”,)

That looks like a fine result from the replSetGetStatus command. Based on the documentation example, “could not find member to sync from” appears to be an expected “infoMessage” value for a primary server. FWIW, that seems like a redundant message, since primaries do not sync by definition.

@sandrokeil: I spoke with @ajdavis about the “not master” error and it does appear to be within spec. To elaborate, the situation entails a primary restarting between the driver’s heartbeat intervals. At the time the driver selects a server for the insert, the topology state still believes that node0 is the primary. This is part to the fact that node0 was abruptly terminated, so its previous error would have been a socket timeout. In such cases, SDAM removes the sockets but leaves the node’s state as-is, per Network error when reading or writing:

If there is a network timeout on the connection after the handshake completes, the client MUST NOT mark the server Unknown. (A timeout may indicate a slow operation on the server, rather than an unavailable server.) If, however, there is some other network error on the connection after the handshake completes, the client MUST replace the server’s description with a default ServerDescription of type Unknown, and fill the ServerDescription’s error field with useful information, the same as if an error or timeout occurred before the handshake completed.

Getting back to the situation, node0 is selected for the insert. The driver no longer has a connection to this node and must create a new one, which is followed by an isMaster command (like all new connections). In this response, node0 identifies itself as a secondary and points to node2 as the new primary. Despite this new information and ensuing topology update (i.e. “Transitioning to RSWithPrimary for RSSecondary” in the trace above), node0 remains the selected node for this operation and we predictably get a “not master” error.

Per SPEC-863, the SDAM spec purposefully does not suggest retrying server selection in this case in order to limit complexity. Only a primary server’s isMaster response is authoritative, so doing so would also require immediate monitoring updates from other nodes in the cluster (node2 in this case). If there were multiple elections or network interruptions in this period, we could see the process chain several times. By not retrying and sticking to the topology state at the time the operation was invoked, we can fail fast and allow the application to proceed as it likes. In this case, a “not master” error can be immediately retried since there is no risk that the write partially succeeded.

I opened SPEC-863 to request that the SDAM spec clarify why server selection is not retried. Additionally, I mentioned in the last paragraph that “only a primary server’s isMaster response is authoritative”, yet we clearly saw libmongoc update the topology state based on node0’s isMaster response after we reconnected and it identified itself as a secondary. This is an outstanding question for the spec, which I’ve asked in SPEC-864. In this particular case, we know that node0 has good information, so trusting its isMaster response actually helps us by providing current topology info for the subsequent driver operations; however, I would like to confirm whether this is acceptable.

With all of that said, I don’t believe we have any driver bugs present here. If SPEC-864 results in a change to the SDAM spec, there will be ensuing driver tickets created, which will result in possible changes to libmongoc. I realize our SPEC project is private in JIRA, but I’ve referenced this GitHub issue in those tickets and will try to follow up here if there is significant activity on those tickets.

@sandrokeil: When running your Docker environment as-is, I did not see any errors in the FPM logs. This may be because the start.sh script sleeps 60 seconds after stopping and restarting a node, which allows the driver adequate time to pick up on any changes. I’ll note that heartbeatFrequencyMS defaults to 60 seconds in single-threaded SDAM implementations. This determines the frequency of scans during which the topology can be updated before server selection occurs.

For reference, here are my abridged FPM logs running your Docker environment as-is:

Attaching to mongodbreplicaconnectionissue_php_1
php_1       | [16-Mar-2017 20:46:59] NOTICE: fpm is running, pid 1
php_1       | [16-Mar-2017 20:46:59] NOTICE: ready to handle connections
php_1       | 172.18.0.3 -  16/Mar/2017:20:47:27 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:47:27 +0000 "GET /index.php" 200
...

I’m cut out lines after the first pair of requests, but there were no exceptions to report.

I then tweaked your script to only sleep 15 seconds after each node was stopped and restarted and was able to produce some exceptions. Note that the HTTP requests are still successful (i.e. 200 instead of 500) since the exceptions are caught and logged. Since your FPM configuration uses a pool of only two workers and the HTTP requests are dispatched concurrently, most of these exceptions showed up in pairs. The errors do not always appear in the same series of requests (there are sometimes error-free requests between them), although we do see them in both workers over the course of the entire test run.

php_1       | [16-Mar-2017 20:57:52] WARNING: [pool www] child 7 said into stderr: "Failed to send "find" command with database "testing": socket error or timeout"
php_1       | [16-Mar-2017 20:57:52] WARNING: [pool www] child 8 said into stderr: "Failed to send "find" command with database "testing": socket error or timeout"
php_1       | 172.18.0.3 -  16/Mar/2017:20:57:52 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:09 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:09 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:09 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:09 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 20:58:27] WARNING: [pool www] child 7 said into stderr: "Failed to send "insert" command with database "testing": socket error or timeout"
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:27 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:27 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 20:58:27] WARNING: [pool www] child 8 said into stderr: "Failed to send "insert" command with database "testing": socket error or timeout"
php_1       | [16-Mar-2017 20:58:44] WARNING: [pool www] child 8 said into stderr: "No suitable servers found (`serverSelectionTryOnce` set): [Server closed connection. calling ismaster on 'node0.mongodb.local:27017'] [Server closed connection. calling ismaster on 'node1.mongodb.local:27017'] [connection closed calling ismaster on 'node2.mongodb.local:27017']"
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:44 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:44 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 20:58:44] WARNING: [pool www] child 7 said into stderr: "No suitable servers found (`serverSelectionTryOnce` set): [Server closed connection. calling ismaster on 'node0.mongodb.local:27017'] [Server closed connection. calling ismaster on 'node1.mongodb.local:27017'] [connection closed calling ismaster on 'node2.mongodb.local:27017']"
php_1       | [16-Mar-2017 20:58:44] WARNING: [pool www] child 8 said into stderr: "No suitable servers found (`serverSelectionTryOnce` set): [Server closed connection. calling ismaster on 'node0.mongodb.local:27017'] [Server closed connection. calling ismaster on 'node1.mongodb.local:27017'] [connection closed calling ismaster on 'node2.mongodb.local:27017']"
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:44 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:58:44 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 20:58:44] WARNING: [pool www] child 7 said into stderr: "No suitable servers found (`serverSelectionTryOnce` set): [Server closed connection. calling ismaster on 'node0.mongodb.local:27017'] [Server closed connection. calling ismaster on 'node1.mongodb.local:27017'] [connection closed calling ismaster on 'node2.mongodb.local:27017']"
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:01 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:01 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:18 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:18 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:20 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:20 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:20 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:20 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:20 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:20:59:20 +0000 "GET /index.php" 200

At this point I decided to make a single adjustment and reduce the heartbeatFrequencyMS value for both clients created in public/index.php ($clientPrimaryPreferred and $clientSecondaryPreferred). I left the script’s sleep time at 15 seconds but reduced heartbeatFrequencyMS to 10 seconds to ensure that clients will update their topologies between node changes and web requests. As expected, the FPM logs for this run showed no exceptions at all.

Lastly, I reverted the heartbeatFrequencyMS change and instead added serverSelectionTryOnce=false. Per the server selection spec, this changes the default behavior for single-threaded drivers to continue polling if the first selection attempt fails until a timeout (30 seconds by default) is reached. Running the test again did produce some exceptions in the FPM logs, but “No suitable servers found” were absent as to be expected.

php_1       | 172.18.0.3 -  16/Mar/2017:22:01:37 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:01:37 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:01:43 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:01:43 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:00 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:00 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 22:02:16] WARNING: [pool www] child 7 said into stderr: "Failed to send "find" command with database "testing": socket error or timeout"
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:16 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:16 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:16 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 22:02:16] WARNING: [pool www] child 8 said into stderr: "Failed to send "find" command with database "testing": socket error or timeout"
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:16 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:33 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:33 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:50 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:50 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:50 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:02:50 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:08 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 22:03:08] WARNING: [pool www] child 7 said into stderr: "Failed to send "insert" command with database "testing": socket error or timeout"
php_1       | [16-Mar-2017 22:03:08] WARNING: [pool www] child 8 said into stderr: "Failed to send "insert" command with database "testing": socket error or timeout"
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:08 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:25 +0000 "GET /index.php" 200
php_1       | [16-Mar-2017 22:03:25] WARNING: [pool www] child 7 said into stderr: "not master"
php_1       | [16-Mar-2017 22:03:25] WARNING: [pool www] child 8 said into stderr: "not master"
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:25 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:26 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:26 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:26 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:26 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:26 +0000 "GET /index.php" 200
php_1       | 172.18.0.3 -  16/Mar/2017:22:03:26 +0000 "GET /index.php" 200

The first two errors we see here are due to the driver attempting to read or write on a terminated socket. Since the test script kills the nodes abruptly, this error is not unexpected. We might see the same thing for any network interruption. The third “not master” error is the result of the driver not realizing that a secondary server is no longer a primary. By tracing the request I obtained the following log:

     mongoc: TRACE   > ENTRY: mongoc_topology_description_select():747"
     mongoc: TRACE   > TRACE: mongoc_topology_description_select():782 Topology type [ReplicaSetWithPrimary], selected [RSPrimary] [node0.mongodb.local:27017]"
     mongoc: TRACE   >  EXIT: mongoc_topology_description_select():785"
     mongoc: TRACE   > ENTRY: mongoc_server_description_handle_ismaster():497"
     mongoc: TRACE   >  EXIT: mongoc_server_description_handle_ismaster():624"
     mongoc: TRACE   > ENTRY: mongoc_server_description_destroy():153"
     mongoc: TRACE   >  EXIT: mongoc_server_description_destroy():159"
 PHONGO-BSON: TRACE   > Added new _id"
     mongoc: TRACE   > ENTRY: mongoc_bulk_operation_insert():290"
     mongoc: TRACE   > ENTRY: _mongoc_write_command_init_insert():192"
     mongoc: TRACE   > ENTRY: _mongoc_write_command_insert_append():82"
     mongoc: TRACE   >  EXIT: _mongoc_write_command_insert_append():112"
     mongoc: TRACE   >  EXIT: _mongoc_write_command_init_insert():208"
     mongoc: TRACE   >  EXIT: mongoc_bulk_operation_insert():312"
     mongoc: TRACE   > ENTRY: mongoc_bulk_operation_execute():598"
     mongoc: TRACE   > ENTRY: _mongoc_write_result_init():1464"
     mongoc: TRACE   >  EXIT: _mongoc_write_result_init():1474"
    cluster: TRACE   > ENTRY: mongoc_cluster_stream_for_server():1565"
    cluster: TRACE   > ENTRY: _mongoc_cluster_stream_for_server():1501"
 topology_scanner: TRACE   > ENTRY: mongoc_topology_scanner_node_connect_tcp():427"
     socket: TRACE   > ENTRY: mongoc_socket_new():727"
     socket: TRACE   > ENTRY: _mongoc_socket_setnodelay():271"
     socket: TRACE   >  EXIT: _mongoc_socket_setnodelay():283"
     socket: TRACE   >  EXIT: mongoc_socket_new():751"
     socket: TRACE   > ENTRY: mongoc_socket_connect():590"
     socket: TRACE   > TRACE: _mongoc_socket_capture_errno():336 setting errno: 115"
     socket: TRACE   > TRACE: _mongoc_socket_errno_is_again():360 errno is: 115"
     socket: TRACE   > ENTRY: _mongoc_socket_wait():113"
     socket: TRACE   >  EXIT: _mongoc_socket_wait():154"
     socket: TRACE   >  EXIT: mongoc_socket_connect():615"
     stream: TRACE   > ENTRY: mongoc_stream_wait():416"
     stream: TRACE   > ENTRY: _mongoc_stream_socket_poll():226"
     socket: TRACE   > ENTRY: mongoc_socket_poll():217"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_poll():252"
     stream: TRACE   >  EXIT: mongoc_stream_wait():438"
    cluster: TRACE   > ENTRY: _mongoc_stream_run_ismaster():433"
    cluster: TRACE   > ENTRY: mongoc_cluster_run_command_internal():170"
     stream: TRACE   > ENTRY: _mongoc_stream_writev_full():490"
     stream: TRACE   > ENTRY: mongoc_stream_writev():149"
     stream: TRACE   > TRACE: mongoc_stream_writev():161 writev = 0x561dc49bb840 [9]"
     stream: TRACE   > 00000:  3a 00 00 00 06 00 00 00  00 00 00 00 d4 07 00 00  : . . . . . . .  . . . . . . . ."
     stream: TRACE   > 00010:  04 00 00 00 61 64 6d 69  6e 2e 24 63 6d 64 00 00  . . . . a d m i  n . $ c m d . ."
     stream: TRACE   > 00020:  00 00 00 ff ff ff ff 13  00 00 00 10 69 73 6d 61  . . . . . . . .  . . . . i s m a"
     stream: TRACE   > 00030:  73 74 65 72 00 01 00 00  00 00                    s t e r . . . .  . ."
     stream: TRACE   > ENTRY: _mongoc_stream_socket_writev():202"
     socket: TRACE   > ENTRY: mongoc_socket_sendv():1089"
     socket: TRACE   > ENTRY: _mongoc_socket_try_sendv():998"
     socket: TRACE   > TRACE: _mongoc_socket_try_sendv():1004 sendbuf = 0x561dc49bb960 [9]"
     socket: TRACE   > 00000:  3a 00 00 00 06 00 00 00  00 00 00 00 d4 07 00 00  : . . . . . . .  . . . . . . . ."
     socket: TRACE   > 00010:  04 00 00 00 61 64 6d 69  6e 2e 24 63 6d 64 00 00  . . . . a d m i  n . $ c m d . ."
     socket: TRACE   > 00020:  00 00 00 ff ff ff ff 13  00 00 00 10 69 73 6d 61  . . . . . . . .  . . . . i s m a"
     socket: TRACE   > 00030:  73 74 65 72 00 01 00 00  00 00                    s t e r . . . .  . ."
     socket: TRACE   > TRACE: _mongoc_socket_try_sendv():1020 Send 58 out of 4 bytes"
     socket: TRACE   >  EXIT: _mongoc_socket_try_sendv():1050"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1100 Sent 58 (of 4) out of iovcnt=9"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 58 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 54 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 50 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 46 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 42 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 38 -= 11"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 27 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 23 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 19 -= 19"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1134 Finished the iovecs"
     socket: TRACE   >  EXIT: mongoc_socket_sendv():1165"
     socket: TRACE   > TRACE: mongoc_socket_errno():307 Current errno: 115"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_writev():208"
     stream: TRACE   >  EXIT: mongoc_stream_writev():164"
     stream: TRACE   > TRACE: _mongoc_stream_writev_full():497 writev returned: 58"
     stream: TRACE   >  EXIT: _mongoc_stream_writev_full():521"
     stream: TRACE   > ENTRY: mongoc_stream_read():268"
     stream: TRACE   > ENTRY: mongoc_stream_readv():226"
     stream: TRACE   > ENTRY: _mongoc_stream_socket_readv():139"
     socket: TRACE   > ENTRY: mongoc_socket_recv():797"
     socket: TRACE   > TRACE: _mongoc_socket_capture_errno():336 setting errno: 11"
     socket: TRACE   > TRACE: _mongoc_socket_errno_is_again():360 errno is: 11"
     socket: TRACE   > ENTRY: _mongoc_socket_wait():113"
     socket: TRACE   >  EXIT: _mongoc_socket_wait():154"
     socket: TRACE   >  GOTO: mongoc_socket_recv():816 again"
     socket: TRACE   >  EXIT: mongoc_socket_recv():826"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_readv():188"
     stream: TRACE   > TRACE: mongoc_stream_readv():236 readv = 0x7fff8f7fd1c0 [1]"
     stream: TRACE   > 00000:  01 02 00 00 3b 00 00 00  06 00 00 00 01 00 00 00  . . . . ; . . .  . . . . . . . ."
     stream: TRACE   > 00010:  08 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  . . . . . . . .  . . . . . . . ."
     stream: TRACE   > 00020:  01 00 00 00                                       . . . ."
     stream: TRACE   >  EXIT: mongoc_stream_readv():239"
     stream: TRACE   >  EXIT: mongoc_stream_read():280"
     stream: TRACE   > ENTRY: mongoc_stream_read():268"
     stream: TRACE   > ENTRY: mongoc_stream_readv():226"
     stream: TRACE   > ENTRY: _mongoc_stream_socket_readv():139"
     socket: TRACE   > ENTRY: mongoc_socket_recv():797"
     socket: TRACE   >  EXIT: mongoc_socket_recv():826"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_readv():188"
     stream: TRACE   > TRACE: mongoc_stream_readv():236 readv = 0x7fff8f7fd1c0 [1]"
     stream: TRACE   > 00000:  dd 01 00 00 04 68 6f 73  74 73 00 68 00 00 00 02  . . . . . h o s  t s . h . . . ."
     stream: TRACE   > 00010:  30 00 1a 00 00 00 6e 6f  64 65 30 2e 6d 6f 6e 67  0 . . . . . n o  d e 0 . m o n g"
     stream: TRACE   > 00020:  6f 64 62 2e 6c 6f 63 61  6c 3a 32 37 30 31 37 00  o d b . l o c a  l : 2 7 0 1 7 ."
     stream: TRACE   > 00030:  02 31 00 1a 00 00 00 6e  6f 64 65 31 2e 6d 6f 6e  . 1 . . . . . n  o d e 1 . m o n"
     stream: TRACE   > 00040:  67 6f 64 62 2e 6c 6f 63  61 6c 3a 32 37 30 31 37  g o d b . l o c  a l : 2 7 0 1 7"
     stream: TRACE   > 00050:  00 02 32 00 1a 00 00 00  6e 6f 64 65 32 2e 6d 6f  . . 2 . . . . .  n o d e 2 . m o"
     stream: TRACE   > 00060:  6e 67 6f 64 62 2e 6c 6f  63 61 6c 3a 32 37 30 31  n g o d b . l o  c a l : 2 7 0 1"
     stream: TRACE   > 00070:  37 00 00 02 73 65 74 4e  61 6d 65 00 08 00 00 00  7 . . . s e t N  a m e . . . . ."
     stream: TRACE   > 00080:  63 6c 75 73 74 65 72 00  10 73 65 74 56 65 72 73  c l u s t e r .  . s e t V e r s"
     stream: TRACE   > 00090:  69 6f 6e 00 01 00 00 00  08 69 73 6d 61 73 74 65  i o n . . . . .  . i s m a s t e"
     stream: TRACE   > 000a0:  72 00 00 08 73 65 63 6f  6e 64 61 72 79 00 01 02  r . . . s e c o  n d a r y . . ."
     stream: TRACE   > 000b0:  70 72 69 6d 61 72 79 00  1a 00 00 00 6e 6f 64 65  p r i m a r y .  . . . . n o d e"
     stream: TRACE   > 000c0:  32 2e 6d 6f 6e 67 6f 64  62 2e 6c 6f 63 61 6c 3a  2 . m o n g o d  b . l o c a l :"
     stream: TRACE   > 000d0:  32 37 30 31 37 00 02 6d  65 00 1a 00 00 00 6e 6f  2 7 0 1 7 . . m  e . . . . . n o"
     stream: TRACE   > 000e0:  64 65 30 2e 6d 6f 6e 67  6f 64 62 2e 6c 6f 63 61  d e 0 . m o n g  o d b . l o c a"
     stream: TRACE   > 000f0:  6c 3a 32 37 30 31 37 00  03 6c 61 73 74 57 72 69  l : 2 7 0 1 7 .  . l a s t W r i"
     stream: TRACE   > 00100:  74 65 00 40 00 00 00 03  6f 70 54 69 6d 65 00 1c  t e . @ . . . .  o p T i m e . ."
     stream: TRACE   > 00110:  00 00 00 11 74 73 00 01  00 00 00 6f 42 cb 58 12  . . . . t s . .  . . . o B . X ."
     stream: TRACE   > 00120:  74 00 02 00 00 00 00 00  00 00 00 09 6c 61 73 74  t . . . . . . .  . . . . l a s t"
     stream: TRACE   > 00130:  57 72 69 74 65 44 61 74  65 00 98 81 fb d9 5a 01  W r i t e D a t  e . . . . . Z ."
     stream: TRACE   > 00140:  00 00 00 10 6d 61 78 42  73 6f 6e 4f 62 6a 65 63  . . . . m a x B  s o n O b j e c"
     stream: TRACE   > 00150:  74 53 69 7a 65 00 00 00  00 01 10 6d 61 78 4d 65  t S i z e . . .  . . . m a x M e"
     stream: TRACE   > 00160:  73 73 61 67 65 53 69 7a  65 42 79 74 65 73 00 00  s s a g e S i z  e B y t e s . ."
     stream: TRACE   > 00170:  6c dc 02 10 6d 61 78 57  72 69 74 65 42 61 74 63  l . . . m a x W  r i t e B a t c"
     stream: TRACE   > 00180:  68 53 69 7a 65 00 e8 03  00 00 09 6c 6f 63 61 6c  h S i z e . . .  . . . l o c a l"
     stream: TRACE   > 00190:  54 69 6d 65 00 8b 94 fb  d9 5a 01 00 00 10 6d 61  T i m e . . . .  . Z . . . . m a"
     stream: TRACE   > 001a0:  78 57 69 72 65 56 65 72  73 69 6f 6e 00 05 00 00  x W i r e V e r  s i o n . . . ."
     stream: TRACE   > 001b0:  00 10 6d 69 6e 57 69 72  65 56 65 72 73 69 6f 6e  . . m i n W i r  e V e r s i o n"
     stream: TRACE   > 001c0:  00 00 00 00 00 08 72 65  61 64 4f 6e 6c 79 00 00  . . . . . . r e  a d O n l y . ."
     stream: TRACE   > 001d0:  01 6f 6b 00 00 00 00 00  00 00 f0 3f 00           . o k . . . . .  . . . ? ."
     stream: TRACE   >  EXIT: mongoc_stream_readv():239"
     stream: TRACE   >  EXIT: mongoc_stream_read():280"
     mongoc: TRACE   > ENTRY: _mongoc_populate_cmd_error():767"
     mongoc: TRACE   >  EXIT: _mongoc_populate_cmd_error():773"
    cluster: TRACE   >  EXIT: mongoc_cluster_run_command_internal():330"
     mongoc: TRACE   > ENTRY: mongoc_server_description_init():94"
     mongoc: TRACE   >  EXIT: mongoc_server_description_init():131"
     mongoc: TRACE   > ENTRY: mongoc_server_description_handle_ismaster():497"
     mongoc: TRACE   >  EXIT: mongoc_server_description_handle_ismaster():624"
     mongoc: TRACE   > ENTRY: mongoc_server_description_handle_ismaster():497"
     mongoc: TRACE   >  EXIT: mongoc_server_description_handle_ismaster():624"
     mongoc: TRACE   > TRACE: mongoc_topology_description_handle_ismaster():1726 Transitioning to RSWithPrimary for RSSecondary"
    cluster: TRACE   >  EXIT: _mongoc_stream_run_ismaster():464"
    cluster: TRACE   >  EXIT: _mongoc_cluster_stream_for_server():1533"
    cluster: TRACE   >  EXIT: mongoc_cluster_stream_for_server():1578"
     mongoc: TRACE   > ENTRY: _mongoc_write_command_execute():1395"
     mongoc: TRACE   > ENTRY: _mongoc_write_command():1233"
     mongoc: TRACE   > ENTRY: _mongoc_write_command_init():271"
     mongoc: TRACE   >  EXIT: _mongoc_write_command_init():290"
    cluster: TRACE   > ENTRY: mongoc_cluster_run_command_internal():170"
     stream: TRACE   > ENTRY: _mongoc_stream_writev_full():490"
     stream: TRACE   > ENTRY: mongoc_stream_writev():149"
     stream: TRACE   > TRACE: mongoc_stream_writev():161 writev = 0x561dc49bba60 [9]"
     stream: TRACE   > 00000:  a3 00 00 00 07 00 00 00  00 00 00 00 d4 07 00 00  . . . . . . . .  . . . . . . . ."
     stream: TRACE   > 00010:  00 00 00 00 74 65 73 74  69 6e 67 2e 24 63 6d 64  . . . . t e s t  i n g . $ c m d"
     stream: TRACE   > 00020:  00 00 00 00 00 ff ff ff  ff 7a 00 00 00 02 69 6e  . . . . . . . .  . z . . . . i n"
     stream: TRACE   > 00030:  73 65 72 74 00 0b 00 00  00 77 72 69 74 65 5f 74  s e r t . . . .  . w r i t e _ t"
     stream: TRACE   > 00040:  65 73 74 00 03 77 72 69  74 65 43 6f 6e 63 65 72  e s t . . w r i  t e C o n c e r"
     stream: TRACE   > 00050:  6e 00 05 00 00 00 00 08  6f 72 64 65 72 65 64 00  n . . . . . . .  o r d e r e d ."
     stream: TRACE   > 00060:  01 04 64 6f 63 75 6d 65  6e 74 73 00 36 00 00 00  . . d o c u m e  n t s . 6 . . ."
     stream: TRACE   > 00070:  03 30 00 2e 00 00 00 02  74 65 78 74 00 0e 00 00  . 0 . . . . . .  t e x t . . . ."
     stream: TRACE   > 00080:  00 35 38 63 62 34 32 37  33 63 66 62 37 30 00 07  . 5 8 c b 4 2 7  3 c f b 7 0 . ."
     stream: TRACE   > 00090:  5f 69 64 00 58 cb 42 73  6e 07 1d 00 08 52 33 98  _ i d . X . B s  n . . . . R 3 ."
     stream: TRACE   > 000a0:  00 00 00                                          . . ."
     stream: TRACE   > ENTRY: _mongoc_stream_socket_writev():202"
     socket: TRACE   > ENTRY: mongoc_socket_sendv():1089"
     socket: TRACE   > ENTRY: _mongoc_socket_try_sendv():998"
     socket: TRACE   > TRACE: _mongoc_socket_try_sendv():1004 sendbuf = 0x561dc49ba960 [9]"
     socket: TRACE   > 00000:  a3 00 00 00 07 00 00 00  00 00 00 00 d4 07 00 00  . . . . . . . .  . . . . . . . ."
     socket: TRACE   > 00010:  00 00 00 00 74 65 73 74  69 6e 67 2e 24 63 6d 64  . . . . t e s t  i n g . $ c m d"
     socket: TRACE   > 00020:  00 00 00 00 00 ff ff ff  ff 7a 00 00 00 02 69 6e  . . . . . . . .  . z . . . . i n"
     socket: TRACE   > 00030:  73 65 72 74 00 0b 00 00  00 77 72 69 74 65 5f 74  s e r t . . . .  . w r i t e _ t"
     socket: TRACE   > 00040:  65 73 74 00 03 77 72 69  74 65 43 6f 6e 63 65 72  e s t . . w r i  t e C o n c e r"
     socket: TRACE   > 00050:  6e 00 05 00 00 00 00 08  6f 72 64 65 72 65 64 00  n . . . . . . .  o r d e r e d ."
     socket: TRACE   > 00060:  01 04 64 6f 63 75 6d 65  6e 74 73 00 36 00 00 00  . . d o c u m e  n t s . 6 . . ."
     socket: TRACE   > 00070:  03 30 00 2e 00 00 00 02  74 65 78 74 00 0e 00 00  . 0 . . . . . .  t e x t . . . ."
     socket: TRACE   > 00080:  00 35 38 63 62 34 32 37  33 63 66 62 37 30 00 07  . 5 8 c b 4 2 7  3 c f b 7 0 . ."
     socket: TRACE   > 00090:  5f 69 64 00 58 cb 42 73  6e 07 1d 00 08 52 33 98  _ i d . X . B s  n . . . . R 3 ."
     socket: TRACE   > 000a0:  00 00 00                                          . . ."
     socket: TRACE   > TRACE: _mongoc_socket_try_sendv():1020 Send 163 out of 4 bytes"
     socket: TRACE   >  EXIT: _mongoc_socket_try_sendv():1050"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1100 Sent 163 (of 4) out of iovcnt=9"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 163 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 159 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 155 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 151 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 147 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 143 -= 13"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 130 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 126 -= 4"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1125 still got bytes left: sent -= iov_len: 122 -= 122"
     socket: TRACE   > TRACE: mongoc_socket_sendv():1134 Finished the iovecs"
     socket: TRACE   >  EXIT: mongoc_socket_sendv():1165"
     socket: TRACE   > TRACE: mongoc_socket_errno():307 Current errno: 0"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_writev():208"
     stream: TRACE   >  EXIT: mongoc_stream_writev():164"
     stream: TRACE   > TRACE: _mongoc_stream_writev_full():497 writev returned: 163"
     stream: TRACE   >  EXIT: _mongoc_stream_writev_full():521"
     stream: TRACE   > ENTRY: mongoc_stream_read():268"
     stream: TRACE   > ENTRY: mongoc_stream_readv():226"
     stream: TRACE   > ENTRY: _mongoc_stream_socket_readv():139"
     socket: TRACE   > ENTRY: mongoc_socket_recv():797"
     socket: TRACE   >  EXIT: mongoc_socket_recv():826"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_readv():188"
     stream: TRACE   > TRACE: mongoc_stream_readv():236 readv = 0x7fff8f7fd400 [1]"
     stream: TRACE   > 00000:  6e 00 00 00 3c 00 00 00  07 00 00 00 01 00 00 00  n . . . < . . .  . . . . . . . ."
     stream: TRACE   > 00010:  08 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  . . . . . . . .  . . . . . . . ."
     stream: TRACE   > 00020:  01 00 00 00                                       . . . ."
     stream: TRACE   >  EXIT: mongoc_stream_readv():239"
     stream: TRACE   >  EXIT: mongoc_stream_read():280"
     stream: TRACE   > ENTRY: mongoc_stream_read():268"
     stream: TRACE   > ENTRY: mongoc_stream_readv():226"
     stream: TRACE   > ENTRY: _mongoc_stream_socket_readv():139"
     socket: TRACE   > ENTRY: mongoc_socket_recv():797"
     socket: TRACE   >  EXIT: mongoc_socket_recv():826"
     stream: TRACE   >  EXIT: _mongoc_stream_socket_readv():188"
     stream: TRACE   > TRACE: mongoc_stream_readv():236 readv = 0x7fff8f7fd400 [1]"
     stream: TRACE   > 00000:  4a 00 00 00 01 6f 6b 00  00 00 00 00 00 00 00 00  J . . . . o k .  . . . . . . . ."
     stream: TRACE   > 00010:  02 65 72 72 6d 73 67 00  0b 00 00 00 6e 6f 74 20  . e r r m s g .  . . . . n o t  "
     stream: TRACE   > 00020:  6d 61 73 74 65 72 00 10  63 6f 64 65 00 7b 27 00  m a s t e r . .  c o d e . { ' ."
     stream: TRACE   > 00030:  00 02 63 6f 64 65 4e 61  6d 65 00 0a 00 00 00 4e  . . c o d e N a  m e . . . . . N"
     stream: TRACE   > 00040:  6f 74 4d 61 73 74 65 72  00 00                    o t M a s t e r  . ."
     stream: TRACE   >  EXIT: mongoc_stream_readv():239"
     stream: TRACE   >  EXIT: mongoc_stream_read():280"
     mongoc: TRACE   > ENTRY: _mongoc_populate_cmd_error():767"
     mongoc: TRACE   >  EXIT: _mongoc_populate_cmd_error():795"
    cluster: TRACE   >  GOTO: mongoc_cluster_run_command_internal():282 done"
    cluster: TRACE   >  EXIT: mongoc_cluster_run_command_internal():330"
     mongoc: TRACE   > ENTRY: _mongoc_write_result_merge():1767"
     mongoc: TRACE   >  EXIT: _mongoc_write_result_merge():1876"
     mongoc: TRACE   >  EXIT: _mongoc_write_command():1381"
     mongoc: TRACE   >  EXIT: _mongoc_write_command_execute():1444"
     mongoc: TRACE   >  GOTO: mongoc_bulk_operation_execute():672 cleanup"
     mongoc: TRACE   > ENTRY: _mongoc_write_result_complete():1961"
     mongoc: TRACE   >  EXIT: _mongoc_write_result_complete():2010"
     mongoc: TRACE   > ENTRY: mongoc_server_description_destroy():153"
     mongoc: TRACE   >  EXIT: mongoc_server_description_destroy():159"
     mongoc: TRACE   >  EXIT: mongoc_bulk_operation_execute():687"
child 8 said into stderr: "not master"
     mongoc: TRACE   > ENTRY: _mongoc_write_command_destroy():1451"
     mongoc: TRACE   >  EXIT: _mongoc_write_command_destroy():1457"
     mongoc: TRACE   > ENTRY: _mongoc_write_result_destroy():1481"
     mongoc: TRACE   >  EXIT: _mongoc_write_result_destroy():1489"
     mongoc: DEBUG   > Not destroying persistent client for Manager"
     mongoc: DEBUG   > Not destroying persistent client for Manager"

The server node0.mongodb.local:27017 is selected as a primary according to the current topology state, but the driver has no existing connection to the server. After creating a new connection, the driver issues an isMaster command to that node. The isMaster result reports that node0 is actually a secondary and that node2 is the primary; however, the write is still issued to node0 and we see a “not master” error. I suspected that this may be due to the PHP library selecting a server in Collection::insertOne() before we actually construct and execute the InsertOne operation class, but bypassing the library and using the driver directly produced the same result. I will follow up with the libmongoc folks to share the trace logs and determine whether “not master” is expected behavior in this scenario.

Regarding Retry ismaster calls once, @ajdavis confirmed that libmongoc currently does not do that for either multi- or single-threaded SDAM implementations. He opened CDRIVER-2075 to track the missing functionality.

I’ll look into the Docker environment you shared and follow up in a bit.