etherpad-lite: Uncaught Error: Failed assertion: Invalid changeset (checkRep failed)
Hey guys. We are using stable and have the problem that some pads randomly stop working and throw an uncaught error in the console.
Uncaught Error: Failed assertion: Invalid changeset (checkRep failed)
Example:
https://etherpad.tugraz.at/p/l3tsbet
When this happens, the “loading” overlay blocks any action. It’s unlikely to be a copy&paste issue because it sometimes happens to entirely handwritten pads.
An interesting thing is, that the timeslider (opened by appending /timeslider to the url) always works without problems.
https://etherpad.tugraz.at/p/l3tsbet/timeslider
Right now we are manually fixing the pads by exporting+importing with HTML (losing all changesets). Any idea whats wrong?
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Reactions: 1
- Comments: 95 (52 by maintainers)
Commits related to this issue
- Try fixing #2107 by removing checkRep — committed to ether/etherpad-lite by marcelklehr 10 years ago
- checkPadDeltas: version by JohnMcLear From https://github.com/ether/etherpad-lite/pull/3717#issuecomment-602179127 > Afaik I used async / await that's pretty much all, I think I had to do some > pol... — committed to JohnMcLear/etherpad-lite by JohnMcLear 4 years ago
Dude, the error is in your log!
See: https://github.com/ether/etherpad-lite/issues/3959
Indeed doing the “replace
????
by??
” helped here as well. 😃 Seems like the last changeset was someone inserting an emoji (it ended in$????
).However, I do not understand why this is classified as a “minor bug”. This bug leads to total loss of a pad (until someone notices the
/timeslider
thing, which took a week in our case, and even then history is lost).For me,
replace(
value,'????','??')
has always worked so far. Hasn’t happened for a few months though.Actually we have umlauts (äöü) in our pads all the time, which are also multi-byte in UTF-8. Based on what has been said above, I think the issue is actually about UTF-16 – which, when originally designed, was intended to have exactly 2 bytes per character (codepoint, really), but now that we have more than 2^16 codepoints there are some that need 4 bytes, like emojis. And now
length()
no longer matches the number of codepoints, and everything gets confused.So maybe a better fix is to outright reject any surrogate pairs (4-byte codepoints)? That would make it impossible to use etherpad with characters from the supplementary plane, but that’s likely broken anyways it seems? And it should protect the DB. There seem to be ways to test for surrogate pairs in JS (but I have zero experience in modern JavaScript).
The error can easily be reproduced by creating a new pad with a single emoji (e.g. 🐼) and restarting etherpad, see also #3340.
Update: As of April 2019, this single emoji itself doesn’t break a pad, even after restarting.
I am following the phabricator ticket at wikimedia but don’t have an account there so I post it here.
Your second broken pad can also be repaired using:
The 4 question marks are the symptom because, iirc, the single bytes in four-byte UTF8 are not valid UTF8. (In UTF8 only the first 127 chars are represented as single bytes, multibyte UTF8 probably uses bytes above 0x7f). So 4 question marks actually represent one 4byte encoded UTF8 string, which represents an code point outside the Basic Multilingual Plane (most probably an emoji 😄). In Javascript those code points would be encoded using UTF16’s surrogate pairs, which are 2 16bit-values.
The checkRep-problem is that in changesets we not only store the characters but also the length of the change. Javascript’s length() function, however, counts surrogate pairs as 2, so e.g. an emoji has length 2. When mysql decodes the string of a changeset to question marks than our representation of a changeset is not valid anymore.
Replacing it with two question marks is a hack not a real solution because we have no idea which code point the user entered in the first place (but as long as the value is in ueberDB’s cache we could get it out from there).
the info I need is if your database is utf8 or utf8mb4 I have extracted the offending changeset, the timeslider will not work around these revisions if you don’t also apply
together with the updates from above this should make your pad reusable again
hi, I had this discussion in the morning. 08:47 < webzwo0i> mutante: i debugged the pad. you normally should not do this, but if you have a database backup (and after making export/etherpad to have a backup of the pad) you can change three database entries and the pad should be working fine again. the three mysql commands are
08:49 < webzwo0i> can you check if your database is running utf8mb4 charset or utf8? 08:51 < webzwo0i> oha ups please dont apply the mysql-commands. maybe i was a little bit to fast 😃 need to check something first 09:08 < webzwo0i> mh nope should be fine, please test… it would be good to know if you ran latest release or something else and which plugins you have enabled 09:47 < webzwo0i> I don’t know if you ppl at wikimedia know each other but if you can find out who the user “Brian” is could you ask him what browser he is using? the reason is I can see what the bug is, but I cannot trigger it in my browser (only manually, but because you ppl are not hostile it was probably not on purpose) 09:49 < webzwo0i> (so we probably have two bugs, one server-side and one client-side)