mongoose: uncaughtException on network outage

Do you want to request a feature or report a bug? bug

What is the current behavior? When there is a network outage, we are experiencing an uncaughtException in our application, causing the process to terminate.

If the current behavior is a bug, please provide the steps to reproduce. We have created a simple test to reproduce this issue.

Below are the options for mongoose:

                {
                    useNewUrlParser: true,
                    useUnifiedTopology: true,
                    serverSelectionTimeoutMS: 20000,
                    connectTimeoutMS: 8000,
                    socketTimeoutMS: 10000,
                    family: 4
                }

and our URL has the format: mongodb+srv://

Every 5s, we run:

const user = await User.findOne({ 'id': '1234' });

Randomly, we will unplugged the network cable, and within 5 to 10 tries, we will get this from the uncaughtException handler (for Node 8.9.3):

2020-01-05 18:58:12.0570 - error: uncaughtException: read ECONNRESET
Error: read ECONNRESET
    at _errnoException (util.js:1024:11)
    at TLSWrap.onread (net.js:615:25)

On Node 12, this is the error:

2020-01-12 19:08:52.0880 - error: uncaughtException: read ECONNRESET
Error: read ECONNRESET
    at TLSWrap.onStreamRead (internal/stream_base_commons.js:201:27)

What is the expected behavior? This condition should be handled by the underlying library, to prevent the application from crashing.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that “latest” is not a version. Node.js - 8.9.3 and 12.13.0 Mongoose - 5.8.4 (also tested on master branch latest commit 58c4c1af426b1a7e95eb736a5f227f5553189368) MongoDB - 3.4.0

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20

Most upvoted comments

OK, a couple of things.

First, I tracked down the culprit exiting the application. It was the logger winston:

By default, winston will exit after logging an uncaughtException. If this is not the behavior you want, set exitOnError = false

Setting exitOnError = false will at least buy us some time until the bug found and fixed.

Second, we reverted back to the non unified topology:

useUnifiedTopology: false,

The app seems stable, so the problem could be due to this flag (which is meant to default to true in the future).

Not sure if it helps, but we noticed that without unified topology, we received the “disconnected”, “connected” and “reconnected” events, but with it, we don’t see those events. Errors are also different, we get a mixture of:

  • MongoError: no primary server available
  • MongoError: Pool was force destroyed
  • MongoNetworkError: write ECONNRESET

So perhaps there is still some work to get unified topology working reliably.

I will continue to observe the behavior, and report here if I can track it down further.

I think this might be thrown by the underlying MongoDB driver, and I believe this is by design as well.

In order to prevent your application from crashing, try the code snippet below, it should help.

process.on('uncaughtException', (err) => {
  console.log('Uncaught exception:', err);
});

process.on('unhandledRejection', (err) => {
  console.log('Unhandled rejection:', err);
});

On a side note, I think you should try to handle the connection problem rather than trying to work around it, you’ll fill your codebase with lots and lots of hacks in order to do that, and there will always be unexpected scenarios.

As mentioned above, we found that winston logger was calling the process.exit() on uncaught exception. But after we disabled that, the app does not exit anymore on the exception. I did not investigate the state of the driver at that point, but I would assume it is in a bad state, eg reconnects could be broken.

After finding out it is caused by unified topology, I stopped my investigation and switched that off.

@keithchew can you please share the script you wrote? It would be very helpful in our attempts to track this down.

I have already done that, which is how I am printing the logs of the uncaughtException in the bug report.

Note that I am not trying to workaround anything, as this is an uncaughtException beneath the application. If the error is propagated up to the application, then we at least have a chance to workaround it, but not in this case. This is a simple test, every 5s we call findOne in a tr/catch block to retrieve a record from the DB…