peritext: What could be the direction for making Peritext support block elements

This is not an issue. Just want to pick the team’s brain on what’s the perspective on dealing with rich text elements with “layouts” like tables and lists. Kudos to the team, Peritext has come up with an efficient representation for holding both character data as well as formatting boundaries.

Handling block elements is a different beast altogether. I’ll take tables for clarity as it’s the most common block element, yet supports most complex operations. Here are the options I see with Peritext’s current representation.

  1. Encode blocks as special control character. For example a table will have control characters for <table-start>, <table-end>, <row-start>, <column-start>, … etc.
[

  { char: "T", opId: "9@B", deleted: false, markOpsBefore: [
    {
      action: "addMark",
      opId: "19@A",
      start: { type: "before", opId: "9@B" },
      end:   { type: "before", opId: "10@B" },
      markType: "bold"
    }
  ] },
  { char: "t", opId: "1@A", deleted: true },
  { char: "h", opId: "2@A", deleted: false },
  { char: "e", opId: "3@A", deleted: false },
  { char: " ", opId: "4@A", deleted: false },
  { char: "f", opId: "5@A", deleted: false },
  { char: "o", opId: "6@A", deleted: false },
  { char: "x", opId: "7@A", deleted: false },
  { char: " ", opId: "10@B", deleted: false, markOpsBefore: [] },
  
  { char: "<table-start-char>", opId: "11@A", deleted: false },
   //..... (rest of the table body with content interweaving table control characters)
  { char: "<table-end-char>", opId: "12@A", deleted: false },

]
  • Makes formatting model play along nicely. But as quoted in the original article it takes significant complexity to tailor it for our desired outcomes. Keeps the base model simple but shifts the complexity to handling outcomes.
  1. Encode tables as trees (i.e semantic JSON) and use automerge’s/custom JSON capabilities.
[

  { char: "T", opId: "9@B", deleted: false, markOpsBefore: [
    {
      action: "addMark",
      opId: "19@A",
      start: { type: "before", opId: "9@B" },
      end:   { type: "before", opId: "10@B" },
      markType: "bold"
    }
  ] },
  { char: "t", opId: "1@A", deleted: true },
  { char: "h", opId: "2@A", deleted: false },
  { char: "e", opId: "3@A", deleted: false },
  { char: " ", opId: "4@A", deleted: false },
  { char: "f", opId: "5@A", deleted: false },
  { char: "o", opId: "6@A", deleted: false },
  { char: "x", opId: "7@A", deleted: false },
  { char: " ", opId: "10@B", deleted: false, markOpsBefore: [] },
  
  {layout: "table", opId: "11@B", deleted: false, rows: [/* other table components */]} // special layout JSON
]
  • Adds additional basic units to the model but plays along well with existing CRDTs that support trees/JSON. Doesn’t mean handling outcomes is any easier. Might take significant complexity to deal with operations that might result in an un-renderable table structure.

None of the options might make sense actually. It’s highly possible there are other better directions to take that handles semantic tree elements like tables and lists, which the team might already have a hang on. I’d be happy to wrap my head around what’s cooking and maybe help in someway.

CRDTs for working with non-schematic JSON is fairly straightforward. CRDTs for schematic JSON like a SQL database is slightly tricky but the operation types (insert row, insert column, update cell) are too simple to warrant a complex solution. But tables in a WYIWYG editors are the most complex of them all with operations like cell-merge and cell-splitting. As far as I’ve studied none of the existing CRDT literature touches on this particular topic. Given Peritext’s direction this project might be the only one, that throws light in this area.

About this issue

Most upvoted comments

My general opinion on this is to do what @kythyria suggested and use a separate (embedded) data structure. This is also what we’ve been talking about in the braid working group. And its what we ended up doing for sharedb, which was inspired by sharejs, which was inspired by my work on Wave (and talking to some of the engineers on that after the project was over).

So I imagine something like this: We have a few ‘primitive’ data structures:

  • CRDT LWW (or, arbitrary-writer wins) register and keep-all-conflicting-values register
  • CRDT map / set / JSON objects.
  • List / plaintext CRDT (what I’m building at the moment in diamond types )
  • Rich text CRDT (this is a list CRDT with support for annotation ranges)
  • … And maybe others.

Then the idea is that each value in the system is another CRDT value. Registers, maps, lists and rich text CRDTs can embed children inside, which are themselves registers, maps, lists and more rich text documents. And so on. And you could do all this either in a dynamically typed way (like JSON) or with a schema.

Then if you have a list of todo items, each item can be JSON value with a few fields. One of the fields is a string description, which is implemented using a CRDT designed for editing text. If the UI supports it, the todo list itself could be embedded or linked into a rich text document. It would take up length=1 like any other item (character) in the rich text document.

If tables need special handling, they can be implemented using any of the above tools. Or someone could build special table semantics using a different construction if they want. And then tables can recursively support the same feature set - because what do table cells contain? Another embedded rich text CRDT.

Are tables really that troublesome? Trying to model tabular data as a stream of tokens or a JSON structure that looks like a table (like an array-of-arrays) is troublesome, but this is over-constraining the problem. I helped design a convergent data model for tables at Notion recently that would work well using 3 convergent data types: Map (to group and address fields), Ordered Set (for defining the order of rows and the order of columns), and Rich Text (for defining the contents of cells).

The structure looks like this, in pseudocode:

type Table = {
  rowOrder: OrderedSet<RowId>
  columnOrder: OrderedSet<ColumnId>
  rows: Map<RowId, Row>
}

type Row = {
  cells: Map<ColumnId, RichText>
}

Map here is used for addressing, not for any kind of convergence property. You can sprinkle on some extra last-write-wins maps w/ Lamport clocks for row, column, and cell format. Use Lamport clocks for row and column IDs so that rows/columns not included in the ...Order fields sort reasonably.

When considering tradeoffs of various approaches here, I think it’s worth highlighting that splitting/joining across blocks seems like an important requirement, at least for a “word processor” use case where a document feels like one contiguous thing. (A blocks-style UI like Notion might have different requirements.)

Let’s say we treat paragraphs as a kind of block element. Imagine we start with a one-paragraph document:

I saw a quick fox. It ran across the street.

Alice and Bob are concurrently editing. Alice splits the paragraph between the two sentences:

I saw a quick fox.
---
It ran across the street.

Concurrently, Bob adds some text to the end of the paragraph:

I saw a quick fox. It ran across the street and jumped across the fence.

The best merge result is pretty clearly:

I saw a quick fox.
---
It ran across the street and jumped across the fence.

However, if we interpret “paragraph split” as “delete some characters from the paragraph and insert a new paragraph containing that text”, then we’d probably end up with Bob’s insertion showing up in the original first paragraph, like this:

I saw a quick fox. and jumped across the fence
---
It ran across the street.

This is kind of a similar issue to the problems with a JSON object of format spans in the Peritext essay. It seems relevant for paragraphs, list elements, headings–anything that we might treat as a block element. Anytime two users concurrently split a block element in different places, or concurrently split + insert, it seems like we’ll run into issues. FWIW, the yjs prosemirror integration demonstrates these kinds of problems because it treats each paragraph as a separate object.

I think the key advantage of Martin’s proposal is that it handles these kinds of cases nicely by working with boundary characters rather than isolated objects.


ASTs of programming languages follow a specific grammar, which makes the problem even more interesting. This means any CRDT that solves this problem will be portable to code editors and IDEs as well.

There is indeed a connection here; e.g. Prosemirror allows defining schemas governing what kinds of configurations of elements comprise a valid document. However I’m quite wary of trying to support complicated schemas on block elements because many constraints seem very difficult to support in a CRDT. For example, what if a parent can only contain a single child of a particular type, and two users concurrently add a child of that type? Maybe there are ways to solve that particular problem, but the general case seems hard.

I have written up a proposal for handling block elements in Peritext. It’s not yet implemented or tested on users, so it may not be ideal, but I think it’s a reasonably plausible way forward.

I’ve been thinking about the nature of this problem and stumbled upon this generalization. What makes WYIWYG tables so difficult to handle? They are JSON objects with a grammar. i.e the arrangements of nodes within this tree has to follow a pattern to be valid. Now, what other trees with grammar do we deal with everyday as a programmer? you might have guessed! ASTs.

ASTs of programming languages follow a specific grammar, which makes the problem even more interesting. This means any CRDT that solves this problem will be portable to code editors and IDEs as well. Simply put, conflict-less merging strategy that always keep the source code syntactically valid at all times. Hopefully this puts a bigger picture for this problem and doesn’t distract the goals of Peritext 😃

… I feel scooped, I was just in the middle of an edit to describe a similar structure. I guess it is obvious after all. That it works for merged cells isn’t to me: if you go with rowspan and colspan, you can have two cells trying to occupy the same space, and merging has to fuse the cell contents without edits being lost or resurrecting the removed cell.

Yeah your nitpick is correct. Same basic idea either way though - the core idea being essentially making newlines as characters.

Not gonna disagree on markup trees being a giant pain though.

The worst part of markup trees is doing formatting annotations - which in wave were also done separately from the XML structure. Peritext has a markup system quite similar to wave’s approach.

The other awful part of wave’s XML approach was that we tried to embed lots of different data types into / on top of our XML structure. For example, 3rd party widgets (“gadgets”) would be handed a subtree of a wave to use to store their data. But gadgets wanted JSON, and doing concurrent maps and sets on top of a concurrent XML tree was awful. Much better to just (as I suggested upthread) embed an OT / CRDT object specific to the desired data semantics in place.

I don’t know much about CKEditor, but I’m convinced all sorts of OT / CRDT design mistakes have been made over the years by people who haven’t bothered with fuzzers. All OT / CRDT algorithms I’ve ever designed or implemented have had mistakes in various ways that fuzzers have found. Many of these problems were quite fundamental (like the splitting / merging lines issue).

Oh, and being reminded of Google Wave reminded me of something else of its data model: At one level, it was basically an outliner: a tree of nodes each of whose data was a rich text document. Which can be used for a more hierarchical version of Jupyter-esque notebooks when you think about it: a mix of nodes with rich text, and nodes that hold data or code for a computation.

Not very traditional-wysiwyg-editor like, though, even if it does potentially let you casually drag the nodes around or skin them as a forum thread or use them as an outliner.

@joelewis

@justjake Interesting take. But I’m trying to wrap my head around how will a merged cell or a split-cell look like with this data model. Something like this image

Any hints?

I consider cell “merge” a format property of the cell - as rowspan and colspan format properties at @kythyria suggests. To merge a [ left cell, right cell ] into a [ two column cell ], you copy the text content of right cell to the end of left cell’s RichText, and then increment left cell’s colspan format. Our data model needs this extension to support format attributes:

type Row = {
  cellFormat: Map<ColumnId, CellFormat>
  cells: Map<ColumnId, RichText>
}

// Each of the properties of the CellFormat object is a last-write-wins register defined by
// the write with the "latest" lamport clock
type CellFormat = LastWriteWinsObject<{
  rowspan?: number
  colspan?: number
  // You can store other stylistic info for the whole cell here, too.
  color?: string
  background?: string
}>

@kythyria

That it works for merged cells isn’t to me: if you go with rowspan and colspan, you can have two cells trying to occupy the same space, and merging has to fuse the cell contents without edits being lost or resurrecting the removed cell […]

A naive renderer might do something strange when cells with >2 rowspan or colspan overlap. I think this is solvable with some precedence rules for stacking the rendered cells:

  • Cells stack top to bottom with the first row the highest, and left to right with the first column the highest. The cell at { row: 0, col: 0 } stacks above all other cells. The cell at { row: 0, col: 1, rowspan: 2 } would stack over the cell { row: 1, col: 0, colspan: 2 }.
  • If a cell’s origin point (it’s RowId and ColumnId) is covered by another cell, don’t render the covered cell at all. This resolves a confusing ambiguity where you can see some data, but it’s not clear how to control the cell.

Overall, this scheme does allow cells that were previously “covered” to be “resurrected”, but I don’t think that’s necessarily a bad thing.

@ept

I have written up a proposal for handling block elements in Peritext. It’s not yet implemented or tested on users, so it may not be ideal, but I think it’s a reasonably plausible way forward.

The tables in Martin’s design (https://martinkl.notion.site/Block-elements-in-rich-text-CRDT-a3b69f886dbc4ad1abe81cea0b3e6623#56ec0311dc86439fa1fa791607e05b5b) basically equivalent to the design we’re discussing - the only difference is the addressing scheme; I intended my Row/Column addressing to be equivalent to Map<(RowId, ColumnId), Cell>.

But, Martin suggests using a different scheme for larger cells:

It’s not immediately obvious how to handle table cells that span several rows or columns. Perhaps they should be identified by a two-dimensional range (startRowId, startColumnId, endRowId, endColumnId)? However, that structure would risk users concurrently creating cells that partially overlap (for example, one cell that spans columns A and B, and another cell that spans columns B and C). Perhaps this is a sufficiently niche edge-case that it’s okay to resolve such conflicts by last-write-wins.

I think this would lead to counterintuitive rendering for larger cells if rows or columns are re-ordered - if a column goes from being the second column to the last column, a colspan=2 cell in the first column suddenly covers the whole table. I think rendering stacking cells with a subtle UI affordance might be more intuitive.