jj: The ContentHash implementation for std::Option is inconsistent with the documentation

The documentation for the ContentHash trait states that:

Enums should hash a 32-bit little-endian encoding of the ordinal number of the enum variant, then the variant’s fields in lexical order.

This does not match the implementation for std::Option, which actually hashes the enum variant’s ordinal number as u8, presumably with architecture specific endianness.

https://github.com/martinvonz/jj/blob/2e64bf83fd9b8abf4c9880482ea4ce19492f3139/lib/src/content_hash.rs#L76-L86

In state.update(&[0]), the &[0] is inferred as a &[u8] due to the function signature of digest::Update.update).

I’m not sure what the implication is of changing the implementation after the fact. Presumably things will break. I think therefore we should update the documentation to say that the ordinal number should be hashed as a LE u8, and we should write the implementation as state.update(&0.as_le_bytes()) to ensure portability.

About this issue

Original URL
State: closed
Created 5 months ago
Comments: 15 (2 by maintainers)

Commits related to this issue

WIP: Derive ContentHash for Enums ISSUE=#3051 — committed to martinvonz/jj by emesterhazy 5 months ago
Add support for deriving ContentHash for Enums This commit also updates the documentation regarding how ContentHash should be implemented for enums to address the concerns raised in #3051. — committed to martinvonz/jj by emesterhazy 5 months ago
Fix the ContentHash implementations for std::Option and MergedTreeId The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained i... — committed to martinvonz/jj by emesterhazy 5 months ago
Fix the ContentHash implementations for std::Option and MergedTreeId The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained i... — committed to martinvonz/jj by emesterhazy 5 months ago
Fix the ContentHash implementations for std::Option and MergedTreeId The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained i... — committed to martinvonz/jj by emesterhazy 5 months ago
Fix the ContentHash implementations for std::Option, MergedTreeId, and RemoteRefState The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the va... — committed to martinvonz/jj by emesterhazy 5 months ago

Most upvoted comments

RE: https://github.com/martinvonz/jj/issues/3051#issuecomment-1945312868, commit https://github.com/martinvonz/jj/commit/2447dfeed8ab90f035f6526f2cc2fce1bbe83930 suggests that you’re right and changing the way a type hashes won’t break anything. Perhaps we should document above the trait that the hash value should be computed once and persisted and not used for integrity checking or re-verification.

Assuming this is correct, then I’d suggest we just change the implementation for std::Option to match the documentation and use a u32 of the ordinals instead of u8.

This is probably a question for @Ralith, but I’m wondering why we can’t simply use the built in Hash trait and need ContentHash instead. Is it just for more control over the hash algorithm?

emesterhazy on Feb 15, 2024

I’m not sure this is an issue with the current macro actually since I don’t think it supports enum types.

Oh, I meant to CC @Ralith on #3054 as the overall tracking issue, in case he would find any of the linked PRs interesting.

martinvonz on Feb 15, 2024

I’m not sure what the implication is of changing the implementation after the fact. Presumably things will break.

iirc, changing the ContentHash value won’t cause problems. New serialized view files may be created with the same content, but that’s okay. We don’t use the hash value to verify the file content.

yuja on Feb 15, 2024