jj: The ContentHash implementation for std::Option is inconsistent with the documentation
The documentation for the ContentHash trait states that:
Enums should hash a 32-bit little-endian encoding of the ordinal number of the enum variant, then the variant’s fields in lexical order.
This does not match the implementation for std::Option, which actually hashes the enum variant’s ordinal number as u8, presumably with architecture specific endianness.
In state.update(&[0]), the &[0] is inferred as a &[u8] due to the function signature of digest::Update.update).
I’m not sure what the implication is of changing the implementation after the fact. Presumably things will break. I think therefore we should update the documentation to say that the ordinal number should be hashed as a LE u8, and we should write the implementation as state.update(&0.as_le_bytes()) to ensure portability.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 15 (2 by maintainers)
Commits related to this issue
- WIP: Derive ContentHash for Enums ISSUE=#3051 — committed to martinvonz/jj by emesterhazy 5 months ago
- Add support for deriving ContentHash for Enums This commit also updates the documentation regarding how ContentHash should be implemented for enums to address the concerns raised in #3051. — committed to martinvonz/jj by emesterhazy 5 months ago
- Fix the ContentHash implementations for std::Option and MergedTreeId The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained i... — committed to martinvonz/jj by emesterhazy 5 months ago
- Fix the ContentHash implementations for std::Option and MergedTreeId The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained i... — committed to martinvonz/jj by emesterhazy 5 months ago
- Fix the ContentHash implementations for std::Option and MergedTreeId The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained i... — committed to martinvonz/jj by emesterhazy 5 months ago
- Fix the ContentHash implementations for std::Option, MergedTreeId, and RemoteRefState The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the va... — committed to martinvonz/jj by emesterhazy 5 months ago
RE: https://github.com/martinvonz/jj/issues/3051#issuecomment-1945312868, commit https://github.com/martinvonz/jj/commit/2447dfeed8ab90f035f6526f2cc2fce1bbe83930 suggests that you’re right and changing the way a type hashes won’t break anything. Perhaps we should document above the trait that the hash value should be computed once and persisted and not used for integrity checking or re-verification.
Assuming this is correct, then I’d suggest we just change the implementation for
std::Optionto match the documentation and use a u32 of the ordinals instead of u8.This is probably a question for @Ralith, but I’m wondering why we can’t simply use the built in
Hashtrait and needContentHashinstead. Is it just for more control over the hash algorithm?Oh, I meant to CC @Ralith on #3054 as the overall tracking issue, in case he would find any of the linked PRs interesting.
iirc, changing the
ContentHashvalue won’t cause problems. New serialized view files may be created with the same content, but that’s okay. We don’t use the hash value to verify the file content.