qoi: Upcoming breaking changes & locking in the data format
Saying that I’m surprised by the amount of attention this is getting would be an understatement. There’s lots of discussion going on about how the data format and compression could be improved and what features could be added.
I want to give my views here and discuss how to go forward.
First and foremost, I want QOI to be simple. Please keep this in mind. I consider the general compression scheme to be done. There’s lots of interesting ideas on how to improve compression. I want to tinker with these ideas - but not for QOI.
QOI will not be versioned. There will only be one version of QOI’s data format. I’m hoping we will be able to strictly define what exactly that is in the coming days.
QOI will only support 24bit RGB and 32bit RGBA data. I acknowledge there’s some need for fewer or more channels and also for higher bit depths or paletted color - QOI will not serve these needs.
So, with all that said, there’s some breaking changes that are probably worthwhile. I want to discuss if and how to implement those.
Proposed changes
-
width,heightandsizein the header should be stored as big endian for consistency with the rest of the format (this change already happened in https://github.com/phoboslab/qoi/commit/c03edb2f2658c13b5f84ee59e3469e0b0b6eb5d1) -
Color differences (
QOI_DIFF_*) shouldbe storedhave the same range as two’s-complement. That means:
- 2bit:
-2..1instead of the current range-1..2 - 4bit:
-8..7instead of the current range-7..8 - 5bit:
-16..15instead of the current range-15..16
- The header should accommodate some more info. Currently there’s demand for 3a) number of channels (#16) 3b) the colorspace (#25 and this huge discussion on HN) 3c) un-/premultiplied alpha (#13) 3d) user-defined values
So, 1) is already implemented; 2) seems like the right thing to do (any objections?); 3) is imho worth discussing.
3a) Storing the number of channels (3 or 4) in the header would allow a user of this library to omit if they want RGB or RGBA and files would be more descriptive of their contents. You would still be able to enforce 3 or 4 channels when loading. This is consistent to what stbi_load does
int x,y,n;
unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
// ... process data if not NULL ...
// ... x = width, y = height, n = # 8-bit components per pixel ...
// ... replace '0' with '1'..'4' to force that many components per pixel
// ... but 'n' will always be the number that it would have been if you said 0
It is my opinion that the channels header value should be purely informative. Meaning, en-/decoder will do exactly the same, regardless of the number of channels. The extra 5bit for alpha in QOI_DIFF_24 will still be wasted for RGB files.
3b) I don’t understand enough about the colorspace issue to gauge the significance. If we implement this however, I would suggest to give this a full byte in the header, where 0 = sRGB and any non-zero value is another, user-defined(?) colorspace.
3c) I’m against an option for premultiplied alpha, because it puts more burden on any QOI implementation to decode in the right pixel format. We should just specify that QOI images have un-premultiplied alpha.
3d) For simplicity’s sake I’d like to put 3a) and 3b) as one byte each into the header. I’m uncertain if we then should “pad” the u32 size in the header with two more bytes. This would make the size 4byte aligned again, but there’s probably no need for it!? A u16 unused could also cause more confusion when other QOI libraries suddenly specify any of these bits to mean something.
With all this, the header would then be the following 16 bytes:
struct qoi_header_t {
char [4]; // magic bytes "qoif"
u16 width; // image width in pixels (BE)
u16 height; // image height in pixels (BE)
u8 channels; // must be 3 (RGB) or 4 (RGBA)
u8 colorspace; // 0 = sRGB (other values currently undefined)
u16 unused; // free for own use
u32 size; // number of data bytes following this header (BE)
};
The one issue I have with this, is how to give these extra header value to the user of this library. qoi_read("file.qoi", &w, &h, &channels_in_file, &colorspace, want_channels) looks like an ugly API. So maybe that would rather be implemented as qoi_read_ex() and qoi_read() stays as it is. I’m still not sure if I want that extended header…
What’s the opinion of the other library authors?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 50 (15 by maintainers)
I would instead advise to make everything little endian. Not just the header fields, but also the bytecodes. Almost all CPUs used today are little-endian, and support unaligned loads.
The current
qoi.himplementation reads one byte at a time:By replacing the
b1,b2andb3uint8_tvariables with a singlebuint32_tvariable, and changing theQOI_MASK_ETCbits from high bits to low bits, this could be:Which might be faster. It certainly looks simpler.
From the OP:
I’ve mentioned it before (#12), but for comparison, here’s the 16-byte NIE header:
Like QOI, I also never expect to define a “version 2”, so the version byte looks useless at first, but I’ve also learned over the years to never say never.
Also, the version byte is 0xFF, which guarantees that the header is both invalid ASCII and invalid UTF-8.
I think this is a bad idea. I have seen several file formats without size and it always looked like a bad idea.
The Entropy Coded Segment of JPEG is such an example. All other JPEG segments have a length. The ECS (Entropy Coded Segment) does not. The ECS segment ends if a 0xff byte is not followed by 0 byte. In decoding the byte sequence 0xff 0 needs to be converted to 0xff. Every other JPEG segment can be skipped easily, because it has a length. An ECS must be processed byte by byte to find its end. This is absolutely crazy.
If the size is missing you need to process everything to get to the end of the data. The size opens the opportunity to have other information afterwards. E.g.: Another header with another image. Or meta information. With a size field you can just jump to the information after the image without processing it. The size can also be used as redundant information (afterwards the file must end). I now think that the “channels” field is not needed, but a “size” field definitive makes sense.
I propose having a
u8that specifies the bit size of thewidthandheightvalues. this would allow small images to take up a few fewer bytes, and super large images possible.Could also be useful for LUTs that are 1xLARGEVALUE.
Yes. If there’s not any severe mistakes in my previous post, this will be the final spec.
That’s a good point, but I will not entertain it for QOI.
QOI_DIFF_24is the only place where that would matter and I expect the CPU to spend most of the time with branch miss-predicts. Sorry, I don’t want to erect a bike-shed 😃I’m working with 48-bit uncompressed scans of film. I might have to create a qoi-48 fork then. 😃 Also given more and more monitors support higher bit depths than 24, it’s a sign of the times I think.
I do not 😃
You can store different planes in one file, where some of those are sRGB (e.g. a specular map(?)) while others are interpreted linearly (e.g. bump maps, normal maps). QOI just provides this info in the header. What the user does with this info is outside of this spec.
I believe there’s more gains to be had by rearranging/nesting the if-statements and other little tweaks. E.g. changing this range check
else if (p < chunks_len)to a simpleelseleads to a similar speedup on my CPU.Anyway, if we want to have the absolutely fastest encode/decode speed, this format would need to change a lot to accommodate SIMD instructions. It’s certainly a worthwhile endeavor, but again, QOI is done! Thank you 😃
I’ve come to the following conclusion:
Things that will change
QOI_DIFFwill shift -1, to be consistent with the range of a two’s complement intQOI_DIFFwill explicitly allow to wrap around. Wether the encoder makes use of this is outside of the specsizefield in the header will be removedwidthandheightin the header will be widened to 32bitchannelsfield will be added to the header. This is purely informative and will not change the behavior of the en-/decodercolorspacebitmap will be added to the header. This is purely informative and will not change the behavior of the en-/decoder.The header then looks like this:
Things that will stay the same
QOI_RUN_8will stay at1..32. Shifting it to2..33and letting the1case always be handled byQOI_INDEXorQOI_DIFFwould needlessly complicate the spec and encoderAny work on better compression methods, dividing an image into blocks to allow for better streaming and other features should be rolled in a successor of QOI. Maybe we can call that QGI then 😃
I’ll try to implement these changes today.
I learned a lot; thank you all so much for your input! ❤️
Not all images are square. Especially in science it isn’t unbelievable that you could want to encode a 4GB image that’s 2^20 by 2^10 pixels (for example). Since the cost is only 32 bits extra, I think it’s worth the extra flexibility.
The maximum RGB size with 16-bit width/height is 12GB and would already take around 30 seconds to decode (although the current implementation fails at 2GB). For these and larger images, you really want the format to support multi-threaded decoding (chunks/tiles), and probably zoom levels. This isn’t “simple”, so I’d keep the 16-bit size limit to stop people misusing the format.
On color spaces… I’m one of those ‘experts’ the HN thread is complaining about 😃
If the user is doing complex image operations in a color accurate space, they’re going to be using, most likely OpenEXR, because the values are important, as is the colorimetric information, and there’s a lot more information that needs to be stored beyond color space (as is pointed out in the HN thread).
The intent of QOI is fast compress/decompress of data, obviously in a real time context.
The data QOI is going to be used with is therefore going to be either linear or SRGB. Drawing a line on the sand on those two seems like a clear, useful, choice.
There’s a rub though. It’s normal to store linear alpha with sRGB data. Also, it’s common to store linear data in color channels, for example, in a normal map or other material input such as “metalness” in a typical PBR set up.
I propose that it makes sense to simply make the rule that individual channels are either linear-and-user-interpreted, or sRGB-with-gamma, and have the format include a byte with a bitmask indicating which channel is which.
I also propose that a bit of 0 would mean gamma, and 1 linear, and ordered such that the A channel lands in the LSB. A mask of 0 therefore might mean all gamma, and a mask of 1 would mean sRGB and linear alpha, regardless of the number of channels. That would make the two most common combinations result in a byte of zero and one, for any number of channels with alpha. The conciseness of only needing to care about zero or one in general appeals to me.
Similarly, for people wanting to stuff EXIF or other metadata after the, uh, non-meta data, you could instead use something based on IFF / RIFF / TIFF.
Indeed.
One option for “a sequence of QOI images”, without an explicit size field, is to present a TAR file that contains QOI files. It’s streaming / single-pass encoding friendly.
@phoboslab has just said that “the data format is now fixed!”.
Let’s move the bikeshedding to: https://github.com/nigeltao/qoi2-bikeshed/issues
Note that GitHub is having some server issues right now.
I spent some more time thinking about how I could use qui files in some of my applications and realized, that the suggested 8 bits of custom information might not necessarily be enough for me.
What if, instead of storing the custom information in the header itself, we store an offset between the end of the header and the start of the image itself. This way people could add their own information as some kind of secondary header, which would be completely ignored (besides a getter maybe) by qoi.
seconded, truth is that little-endian “has won”, everything performance-sensitive today should use little-endian, meaning they’ll be slightly faster on little-endian cpus and slightly slower on big-endian cpus, because practically everything today use little-endian cpus. (X86 and x86-64, Apple M1, iPhone, Samsung phones, the vast majority of smartphones, its all little-endian. the last Apple big-endian system was released in 2005, they transitioned to little-endian in 2006 and have been little-endian ever since.)
This isn’t about portability. It’s about possible compiler’s optimization: https://godbolt.org/z/3brMonWrE