qoi: Upcoming breaking changes & locking in the data format

Saying that I’m surprised by the amount of attention this is getting would be an understatement. There’s lots of discussion going on about how the data format and compression could be improved and what features could be added.

I want to give my views here and discuss how to go forward.

First and foremost, I want QOI to be simple. Please keep this in mind. I consider the general compression scheme to be done. There’s lots of interesting ideas on how to improve compression. I want to tinker with these ideas - but not for QOI.

QOI will not be versioned. There will only be one version of QOI’s data format. I’m hoping we will be able to strictly define what exactly that is in the coming days.

QOI will only support 24bit RGB and 32bit RGBA data. I acknowledge there’s some need for fewer or more channels and also for higher bit depths or paletted color - QOI will not serve these needs.

So, with all that said, there’s some breaking changes that are probably worthwhile. I want to discuss if and how to implement those.

Proposed changes

  1. width, height and size in the header should be stored as big endian for consistency with the rest of the format (this change already happened in https://github.com/phoboslab/qoi/commit/c03edb2f2658c13b5f84ee59e3469e0b0b6eb5d1)

  2. Color differences (QOI_DIFF_*) should be stored have the same range as two’s-complement. That means:

  • 2bit: -2..1 instead of the current range -1..2
  • 4bit: -8..7 instead of the current range -7..8
  • 5bit: -16..15 instead of the current range -15..16
  1. The header should accommodate some more info. Currently there’s demand for 3a) number of channels (#16) 3b) the colorspace (#25 and this huge discussion on HN) 3c) un-/premultiplied alpha (#13) 3d) user-defined values

So, 1) is already implemented; 2) seems like the right thing to do (any objections?); 3) is imho worth discussing.

3a) Storing the number of channels (3 or 4) in the header would allow a user of this library to omit if they want RGB or RGBA and files would be more descriptive of their contents. You would still be able to enforce 3 or 4 channels when loading. This is consistent to what stbi_load does

int x,y,n;
unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
// ... process data if not NULL ...
// ... x = width, y = height, n = # 8-bit components per pixel ...
// ... replace '0' with '1'..'4' to force that many components per pixel
// ... but 'n' will always be the number that it would have been if you said 0

It is my opinion that the channels header value should be purely informative. Meaning, en-/decoder will do exactly the same, regardless of the number of channels. The extra 5bit for alpha in QOI_DIFF_24 will still be wasted for RGB files.

3b) I don’t understand enough about the colorspace issue to gauge the significance. If we implement this however, I would suggest to give this a full byte in the header, where 0 = sRGB and any non-zero value is another, user-defined(?) colorspace.

3c) I’m against an option for premultiplied alpha, because it puts more burden on any QOI implementation to decode in the right pixel format. We should just specify that QOI images have un-premultiplied alpha.

3d) For simplicity’s sake I’d like to put 3a) and 3b) as one byte each into the header. I’m uncertain if we then should “pad” the u32 size in the header with two more bytes. This would make the size 4byte aligned again, but there’s probably no need for it!? A u16 unused could also cause more confusion when other QOI libraries suddenly specify any of these bits to mean something.

With all this, the header would then be the following 16 bytes:

struct qoi_header_t {
	char [4];       // magic bytes "qoif"
	u16 width;      // image width in pixels (BE)
	u16 height;     // image height in pixels (BE)
	 u8 channels;   // must be 3 (RGB) or 4 (RGBA)
	 u8 colorspace; // 0 = sRGB (other values currently undefined)
	u16 unused;     // free for own use
	u32 size;       // number of data bytes following this header (BE)
};

The one issue I have with this, is how to give these extra header value to the user of this library. qoi_read("file.qoi", &w, &h, &channels_in_file, &colorspace, want_channels) looks like an ugly API. So maybe that would rather be implemented as qoi_read_ex() and qoi_read() stays as it is. I’m still not sure if I want that extended header…

What’s the opinion of the other library authors?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 50 (15 by maintainers)

Most upvoted comments

width, height and size in the header should be stored as big endian for consistency with the rest of the format

I would instead advise to make everything little endian. Not just the header fields, but also the bytecodes. Almost all CPUs used today are little-endian, and support unaligned loads.

The current qoi.h implementation reads one byte at a time:

if ((b1 & QOI_MASK_4) == QOI_DIFF_24) {
  int b2 = bytes[p++];
  int b3 = bytes[p++];
  px.rgba.r += (((b1 & 0x0f) << 1) | (b2 >> 7)) - 15;
  px.rgba.g +=  ((b2 & 0x7c) >> 2) - 15;
  px.rgba.b += (((b2 & 0x03) << 3) | ((b3 & 0xe0) >> 5)) - 15;
  px.rgba.a +=   (b3 & 0x1f) - 15;
}

By replacing the b1, b2 and b3 uint8_t variables with a single b uint32_t variable, and changing the QOI_MASK_ETC bits from high bits to low bits, this could be:

if ((b & QOI_MASK_4) == QOI_DIFF_24) {
  px.rgba.r += ((b >>  4) & 31) - 15;
  px.rgba.g += ((b >>  9) & 31) - 15;
  px.rgba.b += ((b >> 14) & 31) - 15;
  px.rgba.a += ((b >> 19) & 31) - 15;
  p += 3;
}

Which might be faster. It certainly looks simpler.

From the OP:

struct qoi_header_t {
	char [4];       // magic bytes "qoif"
	u16 width;      // image width in pixels (BE)
	u16 height;     // image height in pixels (BE)
	 u8 channels;   // must be 3 (RGB) or 4 (RGBA)
	 u8 colorspace; // 0 = sRGB (other values currently undefined)
	u16 unused;     // free for own use
	u32 size;       // number of data bytes following this header (BE)
};

I’ve mentioned it before (#12), but for comparison, here’s the 16-byte NIE header:

struct nie_header_t {
    u32 magic;
    u8  version;
    u8  order;    // BGRA vs BGRX vs RGBA vs RGBX.
    u8  alpha;    // non-premul vs premul.
    u8  depth;    // 8-bit vs 16-bit.
    u32 width;
    u32 height;
}

Like QOI, I also never expect to define a “version 2”, so the version byte looks useless at first, but I’ve also learned over the years to never say never.

Also, the version byte is 0xFF, which guarantees that the header is both invalid ASCII and invalid UTF-8.

the size field in the header will be removed

I think this is a bad idea. I have seen several file formats without size and it always looked like a bad idea.

The Entropy Coded Segment of JPEG is such an example. All other JPEG segments have a length. The ECS (Entropy Coded Segment) does not. The ECS segment ends if a 0xff byte is not followed by 0 byte. In decoding the byte sequence 0xff 0 needs to be converted to 0xff. Every other JPEG segment can be skipped easily, because it has a length. An ECS must be processed byte by byte to find its end. This is absolutely crazy.

If the size is missing you need to process everything to get to the end of the data. The size opens the opportunity to have other information afterwards. E.g.: Another header with another image. Or meta information. With a size field you can just jump to the information after the image without processing it. The size can also be used as redundant information (afterwards the file must end). I now think that the “channels” field is not needed, but a “size” field definitive makes sense.

Make width and height 32-bit.

I propose having a u8 that specifies the bit size of the width and height values. this would allow small images to take up a few fewer bytes, and super large images possible.

Could also be useful for LUTs that are 1xLARGEVALUE.

Yes. If there’s not any severe mistakes in my previous post, this will be the final spec.

This isn’t about portability. It’s about possible compiler’s optimization: https://godbolt.org/z/3brMonWrE

That’s a good point, but I will not entertain it for QOI. QOI_DIFF_24 is the only place where that would matter and I expect the CPU to spend most of the time with branch miss-predicts. Sorry, I don’t want to erect a bike-shed 😃

I’m working with 48-bit uncompressed scans of film. I might have to create a qoi-48 fork then. 😃 Also given more and more monitors support higher bit depths than 24, it’s a sign of the times I think.

Bikeshedding alert: you could re-pack from 14 back to 12 bytes if you really wanted to:

I do not 😃

What does a 0b00000011 value mean? Is a “half sRGB, half other” color space a thing?

You can store different planes in one file, where some of those are sRGB (e.g. a specular map(?)) while others are interpreted linearly (e.g. bump maps, normal maps). QOI just provides this info in the header. What the user does with this info is outside of this spec.

Little-endian shows an approx 1.05x improvement in decode speed.

I believe there’s more gains to be had by rearranging/nesting the if-statements and other little tweaks. E.g. changing this range check else if (p < chunks_len) to a simple else leads to a similar speedup on my CPU.

Anyway, if we want to have the absolutely fastest encode/decode speed, this format would need to change a lot to accommodate SIMD instructions. It’s certainly a worthwhile endeavor, but again, QOI is done! Thank you 😃

I’ve come to the following conclusion:

Things that will change

  • the range of QOI_DIFF will shift -1, to be consistent with the range of a two’s complement int
  • QOI_DIFF will explicitly allow to wrap around. Wether the encoder makes use of this is outside of the spec
  • the size field in the header will be removed
  • width and height in the header will be widened to 32bit
  • a channels field will be added to the header. This is purely informative and will not change the behavior of the en-/decoder
  • a colorspace bitmap will be added to the header. This is purely informative and will not change the behavior of the en-/decoder.
  • the spec will mandate that the alpha channel is un-premultiplied

The header then looks like this:

struct qoi_header_t {
    char [4];       // magic bytes "qoif"
    u32 width;      // image width in pixels (BE)
    u32 height;     // image height in pixels (BE)
     u8 channels;   // must be 3 (RGB) or 4 (RGBA)
     u8 colorspace; // a bitmap 0000rgba where 
                    //   - a zero bit indicates sRGBA, 
                    //   - a one bit indicates linear (user interpreted)
                    //   colorspace for each channel
};

Things that will stay the same

  • BE encoding will stay. To keep the library portable, implementers have to read 1 byte at a time anyway
  • Channel ordering will stay at RGBA
  • the range of QOI_RUN_8 will stay at 1..32. Shifting it to 2..33 and letting the 1 case always be handled by QOI_INDEX or QOI_DIFF would needlessly complicate the spec and encoder
  • the 4 bytes padding will stay to simplify range checks in the decoder

Any work on better compression methods, dividing an image into blocks to allow for better streaming and other features should be rolled in a successor of QOI. Maybe we can call that QGI then 😃

I’ll try to implement these changes today.

I learned a lot; thank you all so much for your input! ❤️

The maximum RGB size with 16-bit width/height is 12GB

Not all images are square. Especially in science it isn’t unbelievable that you could want to encode a 4GB image that’s 2^20 by 2^10 pixels (for example). Since the cost is only 32 bits extra, I think it’s worth the extra flexibility.

Make width and height 32-bit.

The maximum RGB size with 16-bit width/height is 12GB and would already take around 30 seconds to decode (although the current implementation fails at 2GB). For these and larger images, you really want the format to support multi-threaded decoding (chunks/tiles), and probably zoom levels. This isn’t “simple”, so I’d keep the 16-bit size limit to stop people misusing the format.

On color spaces… I’m one of those ‘experts’ the HN thread is complaining about 😃

If the user is doing complex image operations in a color accurate space, they’re going to be using, most likely OpenEXR, because the values are important, as is the colorimetric information, and there’s a lot more information that needs to be stored beyond color space (as is pointed out in the HN thread).

The intent of QOI is fast compress/decompress of data, obviously in a real time context.

The data QOI is going to be used with is therefore going to be either linear or SRGB. Drawing a line on the sand on those two seems like a clear, useful, choice.

There’s a rub though. It’s normal to store linear alpha with sRGB data. Also, it’s common to store linear data in color channels, for example, in a normal map or other material input such as “metalness” in a typical PBR set up.

I propose that it makes sense to simply make the rule that individual channels are either linear-and-user-interpreted, or sRGB-with-gamma, and have the format include a byte with a bitmask indicating which channel is which.

I also propose that a bit of 0 would mean gamma, and 1 linear, and ordered such that the A channel lands in the LSB. A mask of 0 therefore might mean all gamma, and a mask of 1 would mean sRGB and linear alpha, regardless of the number of channels. That would make the two most common combinations result in a byte of zero and one, for any number of channels with alpha. The conciseness of only needing to care about zero or one in general appeals to me.

Similarly, for people wanting to stuff EXIF or other metadata after the, uh, non-meta data, you could instead use something based on IFF / RIFF / TIFF.

There’s pros and cons of not having size as a mandatory field

Indeed.

One option for “a sequence of QOI images”, without an explicit size field, is to present a TAR file that contains QOI files. It’s streaming / single-pass encoding friendly.

@phoboslab has just said that “the data format is now fixed!”.

Let’s move the bikeshedding to: https://github.com/nigeltao/qoi2-bikeshed/issues

Note that GitHub is having some server issues right now.

I spent some more time thinking about how I could use qui files in some of my applications and realized, that the suggested 8 bits of custom information might not necessarily be enough for me.

What if, instead of storing the custom information in the header itself, we store an offset between the end of the header and the start of the image itself. This way people could add their own information as some kind of secondary header, which would be completely ignored (besides a getter maybe) by qoi.

I would instead advise to make everything little endian. Not just the header fields, but also the bytecodes. Almost all CPUs used today are little-endian, and support unaligned loads.

The current qoi.h implementation reads one byte at a time:

if ((b1 & QOI_MASK_4) == QOI_DIFF_24) {
  int b2 = bytes[p++];
  int b3 = bytes[p++];
  px.rgba.r += (((b1 & 0x0f) << 1) | (b2 >> 7)) - 15;
  px.rgba.g +=  ((b2 & 0x7c) >> 2) - 15;
  px.rgba.b += (((b2 & 0x03) << 3) | ((b3 & 0xe0) >> 5)) - 15;
  px.rgba.a +=   (b3 & 0x1f) - 15;
}

By replacing the b1, b2 and b3 uint8_t variables with a single b uint32_t variable, and changing the QOI_MASK_ETC bits from high bits to low bits, this could be:

if ((b & QOI_MASK_4) == QOI_DIFF_24) {
  px.rgba.r += ((b >>  4) & 31) - 15;
  px.rgba.g += ((b >>  9) & 31) - 15;
  px.rgba.b += ((b >> 14) & 31) - 15;
  px.rgba.a += ((b >> 19) & 31) - 15;
  p += 3;
}

Which might be faster. It certainly looks simpler.

seconded, truth is that little-endian “has won”, everything performance-sensitive today should use little-endian, meaning they’ll be slightly faster on little-endian cpus and slightly slower on big-endian cpus, because practically everything today use little-endian cpus. (X86 and x86-64, Apple M1, iPhone, Samsung phones, the vast majority of smartphones, its all little-endian. the last Apple big-endian system was released in 2005, they transitioned to little-endian in 2006 and have been little-endian ever since.)

  • last Apple hardware to use big-endian was 2005
  • last Intel CPU to use big-endian was 2017 (and even that was just a higher-clocked release of a 2012 model, with no architectural improvements, just higher clockspeeds…)

BE encoding will stay. To keep the library portable, implementers have to read 1 byte at a time anyway

This isn’t about portability. It’s about possible compiler’s optimization: https://godbolt.org/z/3brMonWrE