littlefs: assertion "block < lfs->cfg->block_count" failed

I’m running LittleFS on a 16mbyte SPI flash on ESP32 and it’s been working well, but last night I ran into the above assert. Here’s the stack trace:

assertion "block < lfs->cfg->block_count" failed: file "/Users/jason/Projects/x/x-esp32-firmware/components/littlefs/littlefs/lfs.c", line 27, function: lfs_cache_read
abort() was called at PC 0x400e801b on core 0
0x400e801b: __assert_func at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)


Backtrace: 0x40094614:0x3fff2950 0x4009480f:0x3fff2970 0x400e801b:0x3fff2990 0x401703f9:0x3fff29c0 0x40170daa:0x3fff29f0 0x40171472:0x3fff2a40 0x400dbb19:0x3fff2ac0 0x400dabe4:0x3fff2af0 0x401381ad:0x3fff2b70 0x400d99dd:0x3fff2b90 0x400d9a35:0x3fff2bc0
0x40094614: invoke_abort at /Users/jason/esp/esp-idf/components/esp32/./panic.c:648

0x4009480f: abort at /Users/jason/esp/esp-idf/components/esp32/./panic.c:648

0x400e801b: __assert_func at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)

0x401703f9: lfs_cache_read at /Users/jason/Projects/x/x-esp32-firmware/components/littlefs/littlefs/lfs.c:1069

0x40170daa: lfs_ctz_traverse at /Users/jason/Projects/x/x-esp32-firmware/components/littlefs/littlefs/lfs.c:1069

0x40171472: lfs_traverse at /Users/jason/Projects/x/x-esp32-firmware/components/littlefs/littlefs/lfs.c:1069

0x400dbb19: x_fs_get_usage at /Users/jason/Projects/x/x-esp32-firmware/main/./x_fs_littlefs.c:178

Note that I have changed the names of some symbols and files for privacy purposes.

This is happening when I try to use lfs_traverse to get filesystem information. Here’s the code I use for that:

static int lfs_traverse_count(void *p, lfs_block_t b) {
    *(lfs_size_t *)p += 1;
    return 0;
}

esp_err_t x_fs_get_usage(
		uint64_t* bytes_total,
		uint64_t* bytes_free,
		uint64_t* bytes_used) {

	lfs_size_t in_use = 0;
	int err = lfs_traverse(&lfs, lfs_traverse_count, &in_use);
	if (err) {
		return err;
	}

	*bytes_total = cfg.block_count * cfg.block_size;
	*bytes_used = cfg.block_size * in_use;
	*bytes_free = *bytes_total - *bytes_used;

	return ESP_OK;
}

Before this happened, I saw two instances of logging of bad blocks:

lfs debug:1229: Bad block at 3252
lfs debug:1229: Bad block at 3253
lfs debug:1229: Bad block at 3254
lfs debug:1229: Bad block at 3255
lfs debug:1229: Bad block at 3256


lfs debug:1229: Bad block at 436
lfs debug:1229: Bad block at 437
lfs debug:1229: Bad block at 438
lfs debug:1229: Bad block at 439
lfs debug:1229: Bad block at 440

I have since written a simple block checker and run it against the flash, and found no bad blocks. Here’s that code:

	size_t s_size = 4096;
	uint8_t* buffer_work = malloc(s_size);
	uint8_t* buffer_compare = malloc(s_size);
	for (int i = 0; i < (16 * 1024 * 1024) / s_size; i++) {
		// erase block, read it, check all values are 0xff
		memset(buffer_compare, 0xff, s_size);
		memset(buffer_work, 0x00, s_size);
		x_hal_bulk_storage_erase(i * s_size, s_size);
		x_hal_bulk_storage_read(i * s_size, s_size, buffer_work);
		if (memcmp(buffer_compare, buffer_work, s_size) != 0) {
			printf("%08x: erase failed\n", i);
			continue;
		}

		// write bit pattern, read it, compare
		memset(buffer_work, 0b10101010, s_size);
		x_hal_bulk_storage_write(i * s_size, s_size, buffer_work);
		memset(buffer_work, 0xff, s_size);
		x_hal_bulk_storage_read(i * s_size, s_size, buffer_work);
		memset(buffer_compare, 0b10101010, s_size);
		if (memcmp(buffer_compare, buffer_work, s_size) != 0) {
			printf("%08x: bit pattern 1 failed\n", i);
			continue;
		}

		// write second bit pattern, read it, compare
		memset(buffer_work, 0b01010101, s_size);
		x_hal_bulk_storage_write(i * s_size, s_size, buffer_work);
		memset(buffer_work, 0xff, s_size);
		x_hal_bulk_storage_read(i * s_size, s_size, buffer_work);
		memset(buffer_compare, 0x00, s_size);
		if (memcmp(buffer_compare, buffer_work, s_size) != 0) {
			printf("%08x: bit pattern 2 failed\n", i);
			continue;
		}
	}

Aside from not being able to call lfs_traverse, everything else seemed to be working fine. I was able to read and write files, and list directories, but any call to lfs_traverse caused the above assert, even after a reboot.

Please let me know if there’s any other information I can provide that will help with this. Happy to help in any way I can.

Thanks, Jason

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (8 by maintainers)

Most upvoted comments

Hi @geky, thank you for your response!

Answers: Hash: b2124a5ae54717737f6d96cb24e6f31e02158d1e Config:

static struct lfs_config cfg = {
    // block device operations
    .read  = x_read,
    .prog  = x_prog,
    .erase = x_erase,
    .sync  = x_sync,

    // block device configuration
    .lookahead = 256,
    .block_size = 4096,
    .block_count = 4096,
    .prog_size = 256,
    .read_size = 256;
};

Compiler: xtensa-esp32-elf-osx-1.22.0-80-g6c4433a-5.2.0.tar.gz via https://esp-idf.readthedocs.io/en/latest/get-started/macos-setup.html

I am not familiar with qemu, unfortunately. I did find this: https://github.com/Ebiroll/qemu_esp32

In running the block check code I mentioned above I’ve erased the data, which is no problem. It’s all test data at the moment. Next time this happens I will attempt to capture an image of the filesystem.

It does look like maybe it’s starting to happen again, on this fresh format. I just saw lfs debug:1385: Bad block at 785 and I appear to have a reboot from earlier with:

assertion "entry.d.type == LFS_TYPE_REG" failed: file "/Users/jason/Projects/x/x-esp32-firmware/components/littlefs/littlefs/lfs.c", line 1526, function: lfs_file_sync

One odd thing is that I’ve been running LittleFS on this same system for over a week now and these problems have just popped up over the past day.

Thanks to a ton of help from @geky I’ve found this to be a problem in my code. I was calling lfs_traverse without a mutex, while all other lfs_ calls were protected by a mutex. @geky explained that LFS uses a single directory read cache, which was likely getting corrupted by simultaneous reads.

I confirmed the issue by running lfs_traverse in a tight loop on another thread, which corrupted the filesystem in 8 seconds. I then added locking and was not able to reproduce the error.

So, @guillaumerems, make sure you have a mutex around all LFS calls!

I’m going to monitor my device for a few days to make sure this doesn’t happen again, and then will close this issue.

Thank you to @geky for all his help, and for such a great filesystem!

@geky Thanks for the response. I’ve emailed the file to the email you mentioned above.

@geky I’m not sure if I’ve erased the first two blocks in all cases. I was doing quite a bit of stuff, including trying different filesystems, with this particular flash. It’s quite possible this came up after I had experimented with just erasing the first block. I’ve now done a full chip erase on this board and will see if it comes up again.

Oh! The other tests you run, do they erase block 0? What about block 1? The first two erase blocks form the superblock, unfortunately erasing only one can cause problems.

There’s a reoccuring issue I’ve seen where users try to switch between two filesystems on a single piece of storage. If block 0 is erased, LittleFS may pick up block 1, thinking that a power failure occured. Unfortunately, block 1 may contain significantly out of date data, leading to a corrupted filesystem.

So, I guess the question comes down to: Do I need to do a chip erase before formatting to ensure the state of the FS?

No. LittleFS makes no assumption about the state of memory after a format.

But! If your code tries to mount first (which, to be fair, is suggested for handling new flash), and there is a corrupted filesystem. LittleFS may not reformat, mount successfully, but fail later.

So at the moment you do need a full erase (or at least erase the first 2 blocks) after running tests or using another filesystem.

I do have an idea on how to improve this (by checksumming the sum of revision counts on the filesystem). But this would come with a code and runtime cost.

@geky I just had a thought that maybe explains this issue: Would a previously formatted, and used LFS volume, which is then reformatted but not fully erased, potentially cause this issue?

Here’s the situation:

  1. Flash starts erased from factory.
  2. LittleFS formats, writing just a few blocks.
  3. I use LFS for a while, creating some files, allocating blocks, etc.
  4. I corrupt the superblock while doing some other tests.
  5. I attempt to remount, it fails, so I format. I note this only erases and writes a few blocks, not the whole flash.
  6. LFS starts allocating blocks and runs across a block that “looks” right, but it’s actually from a previous format of the FS?

This might explain why I am seeing this error on my development board, which I reformat regularly, but not on other boards that are just in general use.

So, I guess the question comes down to: Do I need to do a chip erase before formatting to ensure the state of the FS?

Thanks @geky.

Lots of remounts, at least 10 per day.

I think I confirmed that there aren’t “really” any bad blocks with my block test code up above. For each block it:

  1. erase the block
  2. read the block and confirm all bytes are 0xff
  3. write 0b10101010 to each byte
  4. write 0b01010101 to each byte
  5. read the block and confirm all bytes are 0x00

Let me know if you think this test is not sufficient.

I will merge the fix you mentioned, and will try -DLFS_NO_INTRINSICS.

In the meantime, I’ll do some testing of my own and see if I can find a way to easily reproduce this.

Thanks for all your help!