nodejs-spanner: runStream returning incomplete and out of order rows

Environment details

  • OS: Ubuntu 16.04.4 LTS
  • Node.js version: 6.14.1
  • npm version: 5.8.0
  • @google-cloud/spanner version: 1.4.1

Steps to reproduce

I have a table with this structure:

CREATE TABLE accounts (
    recordId STRING(36) NOT NULL,
    account_CID STRING(255),
    account_created_on STRING(1024),
    account_created_on_unixTimestamp FLOAT64,
    account_dashboard_tabs STRING(255),
    account_ok_tabs STRING(255),
    account_summary_content_tabs STRING(255),
    account_to_creator STRING(36),
    account_to_customer_admin STRING(36),
    account_view_tabs STRING(255),
    Address STRING(1024),
    Address1 STRING(1024),
    Address1_city STRING(255),
    Address1_postalcode STRING(255),
    Address1_provinceterritory STRING(255),
    Address_city STRING(255),
    Address_postalcode STRING(255),
    Address_provinceterritory STRING(255),
    current_document_number FLOAT64,
    customer_content_tabs STRING(255),
    Email STRING(255),
    fund_to_account STRING(36),
    FundsTab_AccountSubNavigation STRING(255),
    Number FLOAT64,
    Number1 FLOAT64,
    Number_of_Payments FLOAT64,
    payment_plan_to_account STRING(36),
    Phone_Number STRING(1024),
    Phone_Number1 STRING(1024),
    requested_payment_plan_to_account STRING(36),
    RequestRelationshipCID STRING(255),
    RequestRelationshipContentTab STRING(255),
    Selection_Field___Static STRING(MAX),
    Short_Text STRING(255),
    Short_Text1 STRING(255),
    Short_Text4 STRING(255),
    upload_customer_csv STRING(1024),
    upload_customer_csv_name STRING(255),
    user_to_accounts STRING(36),
) PRIMARY KEY (recordId)

When I run the following query:

SELECT `root`.`account_created_on` as `field0`, `root`.`Short_Text` as `field1`, `root`.`Short_Text1` as `field2`, `root`.`account_CID` as `field3`, `root`.`recordId` as `recordId` FROM `accounts` AS root

My table has 17349 records

Occasionally (1 in 5) when I run this query using:

database.runStream(query)
	.on('error', function (err) {
		console.error(err);
		return reject(err);
	})
	.on('data', function (row) {
		let rowJSON = row.toJSON();
		if (!rowJSON.recordId || rowJSON.recordId.match(/^\d+$/)) {
			console.log('Worker '+process.pid+' Row: '+JSON.stringify(rowJSON));
			console.log(row);
		}
		records[rowJSON.recordId] = rowJSON;
	})
	.on('end', function () {
		return resolve(records);
	});

I will get rows that are missing the recordId and the values will be the wrong labels for example field1 will contain the data for field2

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 31 (17 by maintainers)

Commits related to this issue

Most upvoted comments

We’ve had a cache refresh task running using this version of the library for the past ~30 which was previously fairly consistently throwing the errors described in #197. With this change I’ve seen none such errors so far.
🎉

cc @danoscarmike @snehashah16 @stephenplusplus This has been pending for many days now. Can we please prioritize this?

@bbransteitter, after looking at @stephenplusplus repro I think I have discovered the issue. This resolved it in the repro we had. Would you be interested in trying a custom build to see if it fixes your problem? If you install in the following way you can try out my fix: npm install googleapis/nodejs-spanner#180-crwilcox

Is @callmehiphop currently out of office? If not, let’s pull them in. This is a P0 customer issue, it takes priority over feature work.

A patch release is published that includes this fix. https://www.npmjs.com/package/@google-cloud/spanner

Hey @walshie4, the branch is updated but what is pushed is stable/passing. We are at a place now where our existing tests all pass. The team still has to commit a few new tests to cover changes but then we plan to merge/release this.

@crwilcox should be able to answer that 😃

@walshie4 . I think two issues may be conflated here. It seems the fix in #204 does fix this issue (#180) but it doesn’t address #198, which was originally believed to be the same root cause. If you think this is correct I would like to reopen your issue and spin it off as to not cross the two things up too much.

@bbransteitter , did this work for you?

@walshie4 thank you so much for giving this a whirl. I just updated the PR. @stephenplusplus and I came to a different solution that, after a good deal of testing, seems to return the rows. The previous solution had a side-effect of occasionally returning less rows than requested.

if you retry with npm install googleapis/nodejs-spanner#180-crwilcox and could let me know how that works for you it would be awesome.

Thanks again for helping us get to the bottom of this.

Thank you for the update @crwilcox I will try and test it tomorrow Friday 5/23

I’m risking of providing some irrelevant input as I don’t really know this code, but as you asked me, I spent some time looking at toJSON and its roots and I see just one place where things could get messed up and it’s this code which is based on the strong assumption that iterating the values using values.forEach(function(value, index) { ... }) gives the same amount of fields in the same order as stored in this.fields. If it’s not the case for whatever reason, the resulting JSON is screwed up exactly like we see in the issue.

I’m working on getting a consistent reproduction case, then I’ll make that available for anyone who wants to take a crack at it (including myself). So far, I have a theory, @crwilcox is looking at something else that could be the issue. At some point, we’re probably going to have to involve @callmehiphop, though I’m trying to excuse him from this one as long as possible, given his ongoing work with other Spanner priorities.

I’m hoping to have the reproduction case together tomorrow, and I’ll update with any progress after.