etherpad-lite: Uncaught Error: Failed assertion: Invalid changeset (checkRep failed)

Hey guys. We are using stable and have the problem that some pads randomly stop working and throw an uncaught error in the console.

Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) 

Example:

https://etherpad.tugraz.at/p/l3tsbet

When this happens, the “loading” overlay blocks any action. It’s unlikely to be a copy&paste issue because it sometimes happens to entirely handwritten pads.

An interesting thing is, that the timeslider (opened by appending /timeslider to the url) always works without problems.

https://etherpad.tugraz.at/p/l3tsbet/timeslider

Right now we are manually fixing the pads by exporting+importing with HTML (losing all changesets). Any idea whats wrong?

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 1
  • Comments: 95 (52 by maintainers)

Commits related to this issue

Most upvoted comments

Dude, the error is in your log!

[2020-05-05 00:04:12.541] [ERROR] console - table is not configured with charset utf8 -- This may lead to crashes when certain characters are pasted in pads
[2020-05-05 00:04:12.543] [INFO] console - RowDataPacket { character_set_name: 'utf8mb4' } utf8

See: https://github.com/ether/etherpad-lite/issues/3959

Indeed doing the “replace ???? by ??” helped here as well. 😃 Seems like the last changeset was someone inserting an emoji (it ended in $????).

However, I do not understand why this is classified as a “minor bug”. This bug leads to total loss of a pad (until someone notices the /timeslider thing, which took a week in our case, and even then history is lost).

For me, replace(value,'????','??') has always worked so far. Hasn’t happened for a few months though.

FWIW, this bug appears to be due a limitation of the easysync library, which I’m speculating does not to support all of utf-8. (UTF-8 may encode one character as multiple bytes, which each add to the length of a string in javascript, even though it’s just one character.)

Actually we have umlauts (äöü) in our pads all the time, which are also multi-byte in UTF-8. Based on what has been said above, I think the issue is actually about UTF-16 – which, when originally designed, was intended to have exactly 2 bytes per character (codepoint, really), but now that we have more than 2^16 codepoints there are some that need 4 bytes, like emojis. And now length() no longer matches the number of codepoints, and everything gets confused.

So maybe a better fix is to outright reject any surrogate pairs (4-byte codepoints)? That would make it impossible to use etherpad with characters from the supplementary plane, but that’s likely broken anyways it seems? And it should protect the DB. There seem to be ways to test for surrogate pairs in JS (but I have zero experience in modern JavaScript).

The error can easily be reproduced by creating a new pad with a single emoji (e.g. 🐼) and restarting etherpad, see also #3340.

Update: As of April 2019, this single emoji itself doesn’t break a pad, even after restarting.

I am following the phabricator ticket at wikimedia but don’t have an account there so I post it here.

Your second broken pad can also be repaired using:

update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1120";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1254";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1216";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1108";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1106";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1200";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1300";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1400";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1500";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1600";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1700";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1800";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:1900";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:2000";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:2100";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:2200";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:2300";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives:revs:2400";
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS_Retrospectives";

The 4 question marks are the symptom because, iirc, the single bytes in four-byte UTF8 are not valid UTF8. (In UTF8 only the first 127 chars are represented as single bytes, multibyte UTF8 probably uses bytes above 0x7f). So 4 question marks actually represent one 4byte encoded UTF8 string, which represents an code point outside the Basic Multilingual Plane (most probably an emoji 😄). In Javascript those code points would be encoded using UTF16’s surrogate pairs, which are 2 16bit-values.

The checkRep-problem is that in changesets we not only store the characters but also the length of the change. Javascript’s length() function, however, counts surrogate pairs as 2, so e.g. an emoji has length 2. When mysql decodes the string of a changeset to question marks than our representation of a changeset is not valid anymore.

Replacing it with two question marks is a hack not a real solution because we have no idea which code point the user entered in the first place (but as long as the value is in ueberDB’s cache we could get it out from there).

the info I need is if your database is utf8 or utf8mb4 I have extracted the offending changeset, the timeslider will not work around these revisions if you don’t also apply

mysql> update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS-iteration-planning:revs:7105";

together with the updates from above this should make your pad reusable again

hi, I had this discussion in the morning. 08:47 < webzwo0i> mutante: i debugged the pad. you normally should not do this, but if you have a database backup (and after making export/etherpad to have a backup of the pad) you can change three database entries and the pad should be working fine again. the three mysql commands are

mysql> update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS-iteration-planning";
mysql> update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS-iteration-planning:revs:7200";
mysql> update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:iOS-iteration-planning:revs:7300";

08:49 < webzwo0i> can you check if your database is running utf8mb4 charset or utf8? 08:51 < webzwo0i> oha ups please dont apply the mysql-commands. maybe i was a little bit to fast 😃 need to check something first 09:08 < webzwo0i> mh nope should be fine, please test… it would be good to know if you ran latest release or something else and which plugins you have enabled 09:47 < webzwo0i> I don’t know if you ppl at wikimedia know each other but if you can find out who the user “Brian” is could you ask him what browser he is using? the reason is I can see what the bug is, but I cannot trigger it in my browser (only manually, but because you ppl are not hostile it was probably not on purpose) 09:49 < webzwo0i> (so we probably have two bugs, one server-side and one client-side)