syncstorage-rs: Seeing spanner 20k mutation limit errors for inserts in production

I have a large collection of bookmarks(3078 bookmarks and 546 folders) that all of a sudden is failing to sync in production.

Sync id: 130387415. Via bob, I got the following from the production logs:

A database error occurred: RpcFailure(RpcStatus { status: RpcStatusCode(3), details: Some("The transaction contains too many mutations. Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected. The total mutation count includes any changes to indexes that the transaction generates. Please reduce the number of writes, or use fewer indexes. (Maximum number: 20000)") }), status: 500 }

Here’s a link to a sample anonymized collection of this same size.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 25 (25 by maintainers)

Commits related to this issue

Most upvoted comments

@pjenvey I was never able to repro the issue again after the original fix went out. Can you think of any way we can test to confirm the new 1666 limit resolved the issue

We can at least confirm the recurring sentry events stop on stage/prod. So far so good (they’ve stopped), but let’s give prod some more time just in case.

@tublitzed let’s get #377 deployed and confirmed on Mon. then confirm the new limit over to https://github.com/mozilla-mobile/firefox-ios/issues/5896

Thanks! That narrowed it down.

The final commit includes the last 99 items of the batch in the same request. The handler first writes the 99 additions, then commits all 1999. Which means the request writes more like 1999 + 99 items, blowing it over the limit.

We’ll need further limit adjustments or possibly an improvement to how final commits handle this situation

@tublitzed We made it a config value. https://github.com/mozilla-services/syncstorage-rs/pull/319/files I use a config file here because I couldn’t figure out a way to specify the sub value as an environment var. I believe that @pjenvey noted a way that it could be done.

We also enforce the hard spanner limit (not as a config, because it’s a hard limit) https://github.com/mozilla-services/syncstorage-rs/pull/324/files

Batch commit also deletes the batch upon completion, adding an extra mutation or two. So our max_total_records should probably be 1998.

We need a db_test ensuring this limit, it should create 2 batches: of the max size and max size + 1, ensuring commit returns what we expect.

If it’s longer running it could be disabled by default w/ #[ignore] but we have no spanner on CI to ensure it doesn’t fail over time 😦

The 20k here translates into 2000 items (1 mutation per each column. bsos table has 8 columns + 2 extra mutations for our secondary indexes).

It looks like the sync client will print the server’s limits in its log, can you search your log for “max_total_records”? Let’s confirm your client’s seeing a value of “2000”.

E.g. (from my log, probably non-spanner current sync prod): 1573781473132 Sync.Engine.Tabs TRACE new PostQueue config (after defaults): : {"max_request_bytes":2101248,"max_record_payload_bytes":2097152,"max_post_bytes":2101248,"max_post_records":100,"max_total_bytes":209715200,"max_total_records":10000}