twtxt: Suggestion: a convention for multiline statuses (line breaks)

I would like to propose to a convention for multiline status updates or newlines in the twtxt format. The convention is backwards compatible with clients that do not support it. The conventions is: when the client sees a sequence of statuses with the same timestamp, join their text with a newline. A feed following this convention looks reasonable in a client that does not understand it as long as the client displays statues with the same timestamp in the order they appear.

For example, twtxt currently renders

1845-01-29T12:00:00Z	Once upon a midnight dreary, while I pondered, weak and weary,
1845-01-29T12:00:00Z	Over many a quaint and curious volume of forgotten lore—
1845-01-29T12:00:00Z	    While I nodded, nearly napping, suddenly there came a tapping,
1845-01-29T12:00:00Z	As of some one gently rapping, rapping at my chamber door.
1845-01-29T12:00:00Z	“’Tis some visitor,” I muttered, “tapping at my chamber door—
1845-01-29T12:00:00Z	            Only this and nothing more.”

as

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
Once upon a midnight dreary, while I pondered, weak and weary,

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
Over many a quaint and curious volume of forgotten lore—

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
While I nodded, nearly napping, suddenly there came a tapping,

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
As of some one gently rapping, rapping at my chamber door.

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
“’Tis some visitor,” I muttered, “tapping at my chamber door—

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
Only this and nothing more.”

If support for this convention was implemented, twtxt could render the same file as

➤ http://127.0.0.1:8081/poe.txt (175 years ago):
Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore—
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door.
“’Tis some visitor,” I muttered, “tapping at my chamber door—
Only this and nothing more.”

I have implemented the convention in my twtxt.tcl library and GUI feed reader. I have also made a page explaining it (pretty much like this issue does).

What do you think?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 3
  • Comments: 18 (5 by maintainers)

Most upvoted comments

@prologic The recommendation of replacing the blake-hash was for the frugal feed you produce for every yarn.social feed. You can do whatever you want within the legacy yarn.social feed. Please also read the fine prints again: I told you that I connect threads using both the timestamp and the user mention at the beginning of a post, and there can be no collision with that. Incidentally, this scheme was recommended to you years ago in your issue tracker by users, but you went with blake-hashes anyway and claimed that you liked the idea but couldn’t “change” it.

Recall the example I gave you in the past:

http://example.com/joke:

2022-10-31T06:54Z\tWhy do programmers confuse Halloween with Christmas?
2022-10-31T23:00Z\t@<http://example.com/lola> (2022-10-31T11:11Z) @<http://example.com/joke> (2022-10-31T06:54Z) Spot on! Oct 31 = Dec 25

http://example.com/lola:

2022-10-31T11:11Z\t@<http://example.com/joke> (2022-10-31T06:54Z) Something related to eight?

http://example.com/kids:

2022-10-31T22:22Z\t@<http://example.com/joke> (2022-10-31T06:54Z) Beats me

Note that I also support abbreviating user mentions of followers as @joke @lola but that’s beside the point here.

This does not take into consideration the “network”. You cannot have a threading model whereby you either have to a) keep a global id somewhere (counter to a decentralised system) or b) a high rate of collisions (such as a timestamp in one feed that collides with timestamps in all other feeds)

I mean to ask, would it be possible for you to switch from the character \u2028 to the string <br> as proposed by bkil in https://dev.twtxt.net/doc/multilineextension.html? It seems like a nice plain text-alternative in harmony with the existing use of @<example http://example.org/twtxt.txt>. It is compatible with the official client, visible in the editor, and easy to type.

I agree you should support Unicode on modern systems. On most old systems it is awesome if you do, but it is the norm that you don’t. A discussion about what retro and hobby operating systems are better would be out of place here.

If you used a standard library or compiler without UTF-8 support, you could also just split on the 3-byte sequence of e2 80 a8 and then be done with it.

Yep.

Although, I personally recommended the 4-byte sequence of <br> in the past as a viable alternative, as I prefer to keep the text format easily editable and I’m not a fan of invisible markup either.

This would have probably been a better choice than \u2028. The line separator being plain text would avoid problems with existing clients. At worst it might get in the way of people quoting a bit of HTML, but I have never seen that in a twtxt feed.

@bkil:

This requires mandating the yarn.social extension of keeping the feed chronologically sorted past to future.

I am not suggesting that. My proposal only suggests that statuses with the same timestamp be displayed first-to-last-line when they are sequential in the file. This does not affect the overall structure of the file. It is fully backwards compatible.

This is a valid use of the convention:

1999-01-01T00:00:00 Foo
2023-01-01T00:00:00 Bar line 1
2023-01-01T00:00:00 Bar line 2
2014-01-01T00:00:00 Baz
2023-01-01T00:00:00 Qux (not merged with the Bar lines)

timestamps may no longer be used as a primary key.

Isn’t this already the case? Timestamps are not guaranteed to be unique.

@prologic:

The added complexity and burden on clients makes this proposal more difficult to adopt than a simple replacement of the Unicode new line code point u2028.

This proposal’s advantage is that it does not require Unicode on both the writer and the reader side, an editor that can preserve \u2028, and a way for the writer to input \u2028. For these reasons it is better suited for retrocomputing and for plain text editing.

In an imperative programming language the client burden amounts to tracking whether the previous line has the same timestamp as the current line when reading a text stream. It is greater than the burden of splitting the status text on \u2028, but only by so much.


I’ll also note that this is not an either-or thing: a single client can support both this convention and \u2028.