zephyr: mgmt: img: nxp: rt1060: image upload causes a silent reset
Describe the bug The image upload command can cause a non-deterministic silent reset as a simple print can be a way to solve the issue, probably just hiding it by moving some of the timing about bit. The reset will occur on any random time during the upload, so the amount of data that has been transferred and written to the flash is different.
- What target platform are you using?
- NXP MIMXRT1060
- What have you tried to diagnose or workaround this issue?
- Adding/removing code can solve (hide) the issue making it difficult to diagnose.
To Reproduce I have made a small example code that allows one to reproduce the issue. The issue was very reproducible on 3.4.0, as the reset would basically happen every time. I have now tried to make an example code based on 3.5.0 to make sure the issue still persisted, though the issue does not provoke the silent reset every time, it does it most of the time. The example I have made is built upon the hello_world example on a separate fork;
https://github.com/zephyrproject-rtos/zephyr/commit/e120afeb8ed4060dceef8c16bdde03575067dcf2
Steps to reproduce the behavior:
- Checkout my fork of zephyr with the modified hello_world example
- Build a bootloader using the partition overlay that is in the hello_world example
- Build the hello_world example
- Flash both firmwares
- Use any smp client to upload the image (mcumgr) - The silent reset should occur any time. (It may need a couples of tries before it happens)
Expected behavior The firmware should be uploaded successfully without doing a reset
Logs and console output
Got Sensor Sample Data
Got Imu Sample Data
Got Sensor Sample Data <------- sudden reset
*** Booting Zephyr OS build zephyr-v3.5.0-1-ge120afeb8ed4 ***
Hello World! mimxrt1060_evk
TestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTt
SensorMock Init
Got Imu Sample Data
Got Sensor Sample Data
Got Sensor Sample Data
Environment (please complete the following information):
- OS:
- Ubuntu 22.04
- Toolchain (e.g Zephyr SDK, …)
- Zephyr SDK 0.16.3
- Commit SHA or Version used
Additional context The firmware is built for XIP and the image upload command will do a full erase before writing the new firmware
Another example which is based on a commit from the main branch from around august. This is even better to reproduce the issue as it happens every time the upload command is executed;
https://github.com/zephyrproject-rtos/zephyr/commit/2179732e81421a37c9b095a64271ac2ccb873196
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 19 (18 by maintainers)
Hi @Cladoc , If you can replicate the issue during the image update, then you can test it with the updated driver from @danieldegrasse , and see if that resolves the issue. Thanks
I am not aware of a Kconfig to do that. But if you have a solution, you are welcome to contribute it as a Pull Request.
Best regards
Hi @DerekSnell, An additional question. We had some discussions regarding the memory placement of files concerning fault handling when performing operations like these. With the cache disabled I moved these to ITCM and occasionally managed to get some useful information on the type of fault occurring and the precise instruction addresses.
I do realize this is not strictly NXP related, but is there any Kconfig flag that we have missed that ensures these are always placed in RAM, or specifically ITCM/DTCM? If not, do you know if there are any considerations as to add such an option in the future for ease of use instead of manually relocating all concerned files? I believe the logs provided by fault handlers and the chance to grab the halted processor with a debugger are crucial in terms of reducing debugging time when an issue such as this occurs.
Best Regards
Hi @kunoh , We have identified a potential issue in the flash driver that may be causing your issue. Disabling interrupts or the cache will not prevent this issue. BTW, the flash driver does disable interrupts during the flash operations.
There is some code in the driver that reads the device structure during the flash operations. And that stuct is located in flash. So this appears to lead to a Read-While-Write situation, where the CPU attempts to read from the flash during an operation.
NXP will work to fix this driver to avoid that issue. New issue created at https://github.com/zephyrproject-rtos/zephyr/issues/64702. And since this issue is challenging to replicate, we would like your help to test the update, and see if it resolves your issue.
In the meantime, if you want to try to workaround this issue, the structs used by the driver need to be placed in RAM. You can refer to the changes in https://github.com/zephyrproject-rtos/zephyr/commit/e58d0c3bb5a7d3e48bd60697578fa43797617c8 where this was done in the HyperFlash driver.
Thank you for raising this to us.
Hi @DerekSnell ,
We are a bit in doubt if we are disabling the cache correctly. We have tried disablingCONFIG_DCACHEandCONFIG_ICACHE, but the Configuration and Control Register, CCR, still shows that I cache and D cache is enabled. By disabling these two configs we can not reproduce the issue, so something is changing. Is it the correct way to disable the caches or can you let us know how it should be done?UPDATED We figured out how to disable both ICache and DCache using these configs;
CONFIG_IMXRT1XXX_CODE_CACHE=nandCONFIG_IMXRT1XXX_DATA_CACHE=nIf we only disable ICache, we are not able to reproduce the issue. If we only disable DCache, we are not able to reproduce the issue. If we disable both, we can reproduce the issue some times - but it’s not as consistent as when they are enabled.
Regarding disabling global interrupts, we have not tried this yet and will get back to you once we know more.
Thank you.