mongoose: Cursor not found - while iterating over sharded collection
Do you want to request a feature or report a bug? Bug
What is the current behavior? Hy all, I ran into a very weird issue, I have a 2 shards replica set mongodb cluster (latest 3.6.5). One sharded collection with about 300.000 entries randomly fails with
Unhandled rejection MongoError: Cursor not found (namespace: 'hocknfinder.users', id: 1684357377346700205).
at /home/simon/Dev/Web/hokify-server/node_modules/mongodb-core/lib/connection/pool.js:598:61
at authenticateStragglers (/home/simon/Dev/Web/hokify-server/node_modules/mongodb-core/lib/connection/pool.js:516:16)
at Connection.messageHandler (/home/simon/Dev/Web/hokify-server/node_modules/mongodb-core/lib/connection/pool.js:552:5)
at emitMessageHandler (/home/simon/Dev/Web/hokify-server/node_modules/mongodb-core/lib/connection/connection.js:309:10)
at Socket.<anonymous> (/home/simon/Dev/Web/hokify-server/node_modules/mongodb-core/lib/connection/connection.js:452:17)
at Socket.emit (events.js:182:13)
at Socket.EventEmitter.emit (domain.js:442:20)
at addChunk (_stream_readable.js:277:12)
at readableAddChunk (_stream_readable.js:262:11)
at Socket.Readable.push (_stream_readable.js:217:10)
at TCP.onread (net.js:638:20)
I have set cursorTimeoutMillis on mongos to 30 minutes (18000000) quite a while ago to overcome this kind of issue, but now it’s back within seconds some times (never takes longer than 5 minutes till the exception occurs). The problem is: I’m unable to narrow it down. The script that fails does a find() cursor and iterates over it. Every row is pushed in a data array, as soon as this data array has 25.000 entries, the array is sent to another server via HTTP and is emptied again. While the upload to the server is running, the cursor fills it up again. I tried to narrow it down, but as soon as I change the upload code (e.g. not using the library and just doing a regular HTTP request) things behave differently. It even works some times to finish the script without any errors. Therefore I guess it must be related with the network somehow. It seems evertime when the “PUT” http call is done, it somehow confuses the driver of mongodb. The Cursor not found error is most of the time followed by “upload succesful” message.
What I tried so far:
- settings poolsize to 1… no change
- Increase any kidn of timeouts
- Set debug to true, to see what’s going on ==> then it does not fail, but it never finishes either! It get stuck in a “add getMore to queue” call or something…
If the current behavior is a bug, please provide the steps to reproduce.
var Promise = require('bluebird'); // bluebird
var mongoose = require('mongoose');
mongoose.Promise = Promise;
mongoose.connect(database.url, database.opts);
const MyTable = models('user'); // a mongosoe table
const bulkSize = 25000;
const request = require('request-promise-native');
const MongooseStream = MyTable.find({})
.lean()
.cursor({ batchSize: bulkSize });
let data = [];
console.log("start run...")
async function uploadData(commited, csv, retried) {
console.log(`uploading part ${commited}... (${csv.length/1024/1024} mb)`);
const req = {
url: 'https://hokify.com/non-existing-endpoint',
headers: { 'Content-Type': 'text/csv' },
body: csv,
method: 'PUT'
};
return request(req)
.catch(() => {}) // this fails for sure due to the "fake" endpoint
}
let commited = 0;
return MongooseStream.eachAsync(async (obj) => {
if(data.length % 300 == 0) console.log(Math.round(data.length / bulkSize * 100) + "%\r");
data.push(obj);
if (data.length > bulkSize) {
const clonedData = data;
data = [];
console.log('processing...', new Date(), clonedData.length);
return uploadData(++commited, JSON.stringify(clonedData))
.then(() => {
console.log("processing partially done");
})
}
}, {parallel: 10})
.then(async () => {
console.log('ALL DONE', data.length + ' entries left to process');
});
this script does different things: 1.) If I lower the timeout values to: keepAlive: 30000, connectTimeoutMS: 3000, socketTimeoutMS: 36000, reconnectTries: 3, reconnectInterval: 100, it NEVER finishes, it get stuck.
2.) If I set the timeout values to: connectTimeoutMS: 30000, socketTimeoutMS: 360000, reconnectTries: 30, reconnectInterval: 1000,
it sometimes finishes, sometimes hangs and very rarely I can reproduce the Cursor not found error.
3.) Increasing the values from here on doesn’t change much.
What is the expected behavior? Script should just finish fine, if all is good, the total execution takes not more than 7 minutes. Therefore there should never be any kind of cursor timeout.
Please mention your node.js, mongoose and MongoDB version. node v10.1.0 (but also happenign with node 8) mongoose 5.1.2 mongodb 3.6.5
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 16 (10 by maintainers)
I’ve opened a JIRA issue. The issue is definitly somehwere in mongo’s added implicit session support which has been added between 3.0.4 and 3.0.5 of the native driver.
https://jira.mongodb.org/browse/NODE-1482
Thanks
Very interesting outcome:
mongoose 5.0.14 works mongoose 5.1.2 doesn’t
running mongoose 5.1.2 with mongodb & mongodb-core 3.0.4 works running mongoose 5.1.2 with mongodb & mongodb-core 3.0.5 returns in:
Unhandled rejection MongoError: Cursor session id (none) is not the same as the operation context's session id (0b03db89-0000-4653-a369-3a744c84bfa7 - J9HIfMrQRlAhC0lvltFaplCk1dIVYfrjXGoZMXBckbE=)
fixing this error manually by patching lib/bulk/common.js with https://github.com/mongodb-js/mongodb-core/compare/v3.0.7...v3.0.8#diff-95549f1db7facd9411362e6554f63e2f returns again inUnhandled rejection MongoError: Cursor not found (namespace: 'hocknfinder.users', id: 7904791698994979006).
So far what I can say, it has to do with mongodb-core. If I look at the changesets from 3.0.4 to 3.0.5 (https://github.com/mongodb-js/mongodb-core/compare/v3.0.4...v3.0.5) the only change sin there are abuot closing sessions… can anyone have a look who understands the code to see what could cause the weird behaviour in there? Thanks 😃
Simon
@vkarpov15 thanks, I tried parallel: 1 and a bunch of different batchSizes already. I guess the issue here is either mongodb related or somewhere “deep” in the driver. I tried to reproduce it with the “cpu” limitiation today, couldn’t figure out any relation to it. But it keeps happening. Especially weird because I’ve set the noCursorTimeout flag. Which I also debugged and is indeed part of the query.
And regarding your question about the upload, it takes about 50% of the time till the next “data”-set is complete. Arround 30-40 seconds.
Is there any option to disable the “session” support temporarly to see if this is related? EDIT: I just reminded your post about .find({}, null, { skipSessions: true })… I try right now if it makes any difference.
EDIT2: Seems not related to sessions (in case skipSessions does what I think it does). Unhandled rejection MongoError: Cursor not found (namespace: ‘hocknfinder.users’, id: 6724850416504880802) after not even one minute.
EDIT3:
I’ve never managed to get the cursor timeout within the first “batch”. It comes either in the second or any other following one after then. Therefore it even helps to increase the batchSize - very the opposite than I expected.Well, just increased the batch size to over 60.000 and then I also see it in the first batch. So my last statement is not true anymore.EDIT4: I can definitely verify that it is not happening with mongoose 5.0.14. I’m trying to track it down now to mongodb-core or mongodb package.
Thanks