flate2-rs: Creating a slice of uninit memory is unsound (equivalent to mem::uninitialized)

flate2 creates slices of uninitialized memory in several places. The only two places where it happens when using the Rust backend are here and here.

This is equivalent to the use of the now-deprecated mem::uninitialized. Instead, either a slice of MaybeUninit<T> should be constructed and passed to the backends, or the backends should receive a structure that does not expose uninitialized memory by design such as Vec or a Vec-like fixed-capacity view of memory.

If the backend does not overwrite the entire slice, this can become an exploitable security vulnerability.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

What is the status of this? Are these concerns purely hypothetical, or is there potentially unsoundness in the library?

For the rust back-end we shouldn’t really need to be using unsafe stuff unless there is some very good reason for it. Not sure if say avoiding zero-initializing a vector is worth it. miniz_oxide already avoids using unsafe without any drastic performance implications, so not sure why flate2 would need to either.

Thanks for chiming me in - I was too busy in the past days but am glad I join now since all information has been gathered so neatly. It took an hour to catch up on all that was said, with a considerable portion spent on reading Ralf Jung’s excellent articles (again, probably, I keep forgetting).

With that said, here is my amateur perspective on uninitialized memory:

  • it’s like Option<u8> and hence observing it is forbidden until it becomes Some(v). If it’s observed anyway, it’s UB.
  • Creating a slice from uninitialized memory is UB because it allows reading. Here it’s less about whether it’s actually read, but that it can be read, and strange optimizations can happen if this is done. The compiler can also insert spurious reads for optimization, which may cause said memory to be read even though it looks like it will only be written to.

So for me it’s also established that the current behaviour of compress_vec needs fixing.

I think it worth spelling out that correctness always trumps performance, so performance considerations shouldn’t hold back a simple fix - optimizations, possibly like discussed, can be made later if there is demand.

As a general stance, we will avoid breaking changes as these will ripple through the ecosystem with a lot of churn, so any fix here must be none-breaking.

From the above it seems clear that #373 is a suitable and welcome fix.

Regarding performance

gitoxide is an avid user of flate2. The most performance sensitive code is stream decompression which happens to use decompress() with its own buffer management. I spent some time profiling the code to arrive there, and it appears that those who truly are dependent on the fastest possible implementation wouldn’t use (de)compress_vec() anyway. gitoxide settled for reusing allocated buffers.

It’s notable that gitoxide still doesn’t re-use inflate state which it truly should for even more performance, as this code is called a whole lot, but I digress.

Conclusion

Yes, let’s merge the fix and not worry about performance as those with performance concerns will have suitable alternatives. Correctness is more important.

Maybe there is more spots of code that have soundness issues like this? I’d love to see them rooted out. Thanks everyone for your help and engagement, it’s much appreciated. I am grateful too as I feel like I became a little more aware and could adjust my mental model towards uninitialized memory. I think gitoxide also has those issues and I will revisit them (i.e. v = Vec::with_capacity(x); v.set_len(x); &mut v).

it’s UB automatically, regardless of what it allows

Uninitialised memory is even meaner than I thought 😅.

I will prepare a new release now.

The approach of writing to the spare capacity without zeroing it first sounds good, provided that the C library wrappers are adjusted to write to a &mut [MaybeUninit<u8>].