modular-bitfield: Allow returning bytes in MSB order

Currently (at least on my platform, ARMv7), bytes are returned in LSB order but my driver requires them to be MSB. How tricky would it be to add a configuration option for this?

Example:

#[bitfield]
pub struct MyBitfield {
  foo: B16,
}

let test = MyBitfield::new().with_foo(0x1234u16);

test.as_bytes() returns [0x34, 0x12] but I need it to be [0x12, 0x34]. While I could reverse the order of the result from as_bytes, it is error-prone and would IMO be better captured by the bitfield itself.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 3
  • Comments: 27 (11 by maintainers)

Most upvoted comments

Going a step further… The internal bit storage order being able to be reversed would be nice. Trying to write a driver for this (look at page 63+) and am finding that a crate like this would be quite helpful to define how to speak with the device, but the LE ordering of the crate makes it so I have to do a lot of driver internal fiddling, even on the i8 register values. It gets way more obvious when you start looking at i16 and i24 values thanks to the combination of some registers. Even with a into_be_bytes option this crate wouldn’t let me store correct values in such cases.

I know you can swap the SPI controllers read/write mode to support LSB, but most controllers I’ve seen are MSB by default. It’d be nice to have a “protocol mode” as a result (for lack of a better term).

I’m interested in this feature as well. Getting bitfields and endianness to work correctly is very tricky and needs a library solution.

From my experience, the only way to handle this situation is to carefully define a bitfield layout in a way that’s independent of endianness. You do this by defining it in terms of the underlying representation. So taking the example from the previous comment and using #[repr(u16)], it has the following layout:

 1  1
 5  4                    0
--------------------------
|B1|        B2           |
--------------------------

Notice that this layout works regardless of the machine endianness: within the 16 bits of a u16, the MSB is B1 and everything else is B2. No bytes are mentioned in this definition.

(Note, however, that the order of the fields within this representation is completely ambiguous. B1 could be the LSB, for example. In fact, GCC actually reverses the field layout based on endianness. This is a mistake IMO, and should not be repeated by this library. Pick one order, document it, and stick to that. Backwards compatibility will likely determine your choice.)

As a separate step, you define the byte order when this representation needs to be viewed as bytes. There are three options: host endian, little endian, and big endian. The representation in memory should always be in the host endianness (so the underlying u16 is just that), with options to read and write it explicitly in either big or little endian. In the end, that means providing from_le_bytes and from_be_bytes. Also, don’t forget from_ne_bytes for native endianness (like Rust does with primitive types, which is guaranteed to be fast). I’d also suggest renaming to to_{le,be,ne}_bytes for consistency with std.

The above solution works with every interface I’ve worked with and layout I’ve seen from a C or C++ compiler. It’s also compatible with how people do bitfields manually (bit shifts on integer values). (With the possible exception of mixed endianness mentioned above, though to be honest I don’t even know what that really means in this context. Regardless, I think it’s rare enough to ignore from a library perspective.)

This interpretation apparently departs from the current implementation, which uses byte arrays under the hood. To be frank, I think that’s a dubious implementation to begin with. However, it does allow for space optimization when the number of bits is something like 24 bits instead of 32, and it allows for very large bitfields. I’m not sure the advantages are significant (you can break up a bitfield if needed and the space overhead is likely minimal). As a compromise if that’s still desired, I suggest using a byte array when a representation isn’t provided, but consider not providing a way to access the bytes.


In summary, I recommend doing the following:

  1. Under the hood, store the data using the provided representation rather than a byte array. If none is provided, choose the next larger unsigned integer. Limit bitfields to 128 bits.
  • Alternatively, use a byte array by default, but with no access to the array or access but with no guarantees about layout.
  1. Remove into_bytes and from_bytes
  2. Provide in_le_bytes, in_be_bytes, in_ne_bytes, from_le_bytes, from_be_bytes, and from_ne_bytes. These should basically just be calls to the equivalent function provided by the underlying type.

Yeah, in the meanwhile, I have even worked with protocols which use mixed LSB/MSB encoding (yes, i am talking to you, IEEE 802.15.4).

Probably the proper way is to introduce LE16 and BE16 types in place of B16 so that the endianness can be specified on a field by field basis. This gets very close to the zerocopy crate, but that one doesn’t have bitfield support (which is kind of the selling point of this library), so I don’t see a conflict here. We would need to define what the odd types (e.g. B15) means for the two variants. For example:

#[bitfield]
pub struct MyBitfield {
  foo: B15,
  bar: B1,
}

How does the memory layout when foo is stored as LE differ from the layout when foo is stored BE? What does it even mean to define endianness for a 15 bits? Maybe we should just not support this for non-even fields (i.e. LE16 but no LE15)?

Hm, I think I see the use case (correct me if I’m wrong)… imagine it’s some HW register somewhere, you could:

#[bitfield]
struct WwdtTimeReg {
   count: B24;
   reserved: B8;
};

let reg = 0x4000_000c as *WwdtTimeReg;
let reg: &WwdtTimeReg = reg.as_ref().unwrap();
...
let current_count = reg.count();

vs. having to do a two-step process:

let reg = 0x4000_000c as *u32;
...
let register_value = WwdtTimeReg::from_le_bytes(*reg); // Sorry I forget the exact syntax right now
let current_count = register_value.count();

It’d prevent you from being able to atomically set/get all fields simultaneously, but that may be fine. In this case there’s only one interesting field anyway.