littlefs: Block-cycles/revision-count improvements
I have a system where I have an AES encryption layer between the littlefs code and the actual flash back-end. That means that when flash is erased, it goes to a known consistent value for a given offset (for a given key). I’ve seen lfs_format(…) fail depending on this key which changes the random data coming back for erased flash. One of the times I hit this call:
returning LFS_ERR_CORRUPT. When this occurred, lfs_format(…) didn’t work. I don’t know enough about the inner workings of lfs, but it seems like when lfs_format(…) uses this it should compensate for the “corruption” and actually attempt to write out to flash.
I’ve attached the output of the AES decryption on the erased values of the flash that hit this error which corresponds to the issue I had above. You can see this fail to format with the lfs_fuse project, too:
$ ./lfs /dev/loop0 --block_size=4096 --read_size=16 --prog_size=16 --cache_size=4096 --lookahead_size=128 --block_cycles=100 --format
littlefs/lfs.c:997:error: Corrupted dir pair at {0x1, 0x0}
Thanks, Tim
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 25 (15 by maintainers)
Here are the four method’s disassembly on ARM when compiling for -Os: https://godbolt.org/z/9sKP9r
Sorry, I probably should have explained it better. 😃 But yes, since that lowest bit dictates even/odd, it could be used to decide offsetting the eviction modulus result, and then that bit gets thrown away in the overall divide by 2 before the modulus is checked. Same overall result as making the modulus not contain the prime factor 2, but allows for better optimizations by the compiler. 😃
Hopefully the compiler would recognize you’re either multiplying by 0 or 1 here and just reduce it to a conditional add instruction in the underlying architecture. I suppose for code readability you might actually have it a separate line:
Ah, I see, thanks for the clarification!
Hi @dpgeorge -
It completely mitigates the issue I observed and captured in the original ticket here. There’s a sample case as an attachment with the LFS mount parameters to replicate the issue I saw. The commit in #490 fixes the issue with that attachment, and I’m pretty sure it’ll fix it for all cases that have the same issue (e.g. eviction, which expects a valid filesystem, during lfs_format() where there isn’t a valid filesystem yet). The eviction logic is just a counter and it runs the eviction code if it modulo X is 0. So if the existing contents caused that to occur, the eviction logic said “corrupt filesystem” and this trickled out into the lfs_format() call with the LFS_ERR_CORRUPT error code.
I should probably just mark this as closed as the fix is now in the “devel” branch. It’s just not in any formal release yet.
Hi @geky -
No problem! A few other observations I was thinking about:
(rev >> 1) % block_cycles- e.g. discarding the lowest bit. The advantage of this approach is that it could be optimized to bit-wise operations in the static configuration case above. (Technically the current eviction logic is probably slightly better from a “lots of writes in a row perspective” since you won’t hit two erases in a row as easily for the directory pair.) (Of course, you’d need a minimum on the modulus here of 2 or something.) You could maintain the out of phase thing here (and allow it to be optimized to bit-wise operations when static configurations come along) by doing((rev + ((rev & 1) ? block_cycles : 0)) >> 1) % block_cycles(untested math, but I think it checks out). In this case you’d still do the rev + 1 logic elsewhere. You’d need to avoid incrementing block_cycles by 1 to get the bit-wise optimization with static configs of course, so it might be better to implement a “minimum” block cycles value via a helper function getter or something?Anyhow… I don’t know if any action needs to happen with the above thoughts - just random musings. 😃
– Tim
I did a little more investigation today, and it looks like:
For a more normal flash erased value of all 0xff’s or 0x00’s, the initial revision gets set to 0, so the initial case always works.
It seems like the lfs_dir_alloc(…) is attempting to preserve the existing revision counter for a given block, so that lfs_dir_compact(…) later can decide whether or not it needs to move blocks or not. If it does this across the board a filesystem on top of an encrypted layer in there will end up with random counters for evicting a given flash block across all of the blocks. This isn’t strictly a problem, but in the format case it seems like it shouldn’t do this at all since there are “unknown” values in the superblock location.
Should the format case ignore the current revision counter in the superblock, or preserve it? One easy way to fix this particular case would be to do the following inside lfs_format(…):
Alternatively, during a lfs_format(…) only, we’d need to consider a “CORRUPT” case inside lfs_fs_size(…) to be grounds to continue formatting anyways. That’d likely need additional arguments trickled down into it though.