jj: The ContentHash implementation for std::Option is inconsistent with the documentation

The documentation for the ContentHash trait states that:

Enums should hash a 32-bit little-endian encoding of the ordinal number of the enum variant, then the variant’s fields in lexical order.

This does not match the implementation for std::Option, which actually hashes the enum variant’s ordinal number as u8, presumably with architecture specific endianness.

https://github.com/martinvonz/jj/blob/2e64bf83fd9b8abf4c9880482ea4ce19492f3139/lib/src/content_hash.rs#L76-L86

In state.update(&[0]), the &[0] is inferred as a &[u8] due to the function signature of digest::Update.update).

image

I’m not sure what the implication is of changing the implementation after the fact. Presumably things will break. I think therefore we should update the documentation to say that the ordinal number should be hashed as a LE u8, and we should write the implementation as state.update(&0.as_le_bytes()) to ensure portability.

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 15 (2 by maintainers)

Commits related to this issue

Most upvoted comments

RE: https://github.com/martinvonz/jj/issues/3051#issuecomment-1945312868, commit https://github.com/martinvonz/jj/commit/2447dfeed8ab90f035f6526f2cc2fce1bbe83930 suggests that you’re right and changing the way a type hashes won’t break anything. Perhaps we should document above the trait that the hash value should be computed once and persisted and not used for integrity checking or re-verification.

Assuming this is correct, then I’d suggest we just change the implementation for std::Option to match the documentation and use a u32 of the ordinals instead of u8.

This is probably a question for @Ralith, but I’m wondering why we can’t simply use the built in Hash trait and need ContentHash instead. Is it just for more control over the hash algorithm?

I’m not sure this is an issue with the current macro actually since I don’t think it supports enum types.

Oh, I meant to CC @Ralith on #3054 as the overall tracking issue, in case he would find any of the linked PRs interesting.

I’m not sure what the implication is of changing the implementation after the fact. Presumably things will break.

iirc, changing the ContentHash value won’t cause problems. New serialized view files may be created with the same content, but that’s okay. We don’t use the hash value to verify the file content.