zephyr: mgmt: img: nxp: rt1060: image upload causes a silent reset

Describe the bug The image upload command can cause a non-deterministic silent reset as a simple print can be a way to solve the issue, probably just hiding it by moving some of the timing about bit. The reset will occur on any random time during the upload, so the amount of data that has been transferred and written to the flash is different.

  • What target platform are you using?
    • NXP MIMXRT1060
  • What have you tried to diagnose or workaround this issue?
    • Adding/removing code can solve (hide) the issue making it difficult to diagnose.

To Reproduce I have made a small example code that allows one to reproduce the issue. The issue was very reproducible on 3.4.0, as the reset would basically happen every time. I have now tried to make an example code based on 3.5.0 to make sure the issue still persisted, though the issue does not provoke the silent reset every time, it does it most of the time. The example I have made is built upon the hello_world example on a separate fork;

https://github.com/zephyrproject-rtos/zephyr/commit/e120afeb8ed4060dceef8c16bdde03575067dcf2

Steps to reproduce the behavior:

  1. Checkout my fork of zephyr with the modified hello_world example
  2. Build a bootloader using the partition overlay that is in the hello_world example
  3. Build the hello_world example
  4. Flash both firmwares
  5. Use any smp client to upload the image (mcumgr) - The silent reset should occur any time. (It may need a couples of tries before it happens)

Expected behavior The firmware should be uploaded successfully without doing a reset

Logs and console output

Got Sensor Sample Data
Got Imu Sample Data
Got Sensor Sample Data <------- sudden reset
*** Booting Zephyr OS build zephyr-v3.5.0-1-ge120afeb8ed4 ***
Hello World! mimxrt1060_evk
TestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTestTt
SensorMock Init
Got Imu Sample Data
Got Sensor Sample Data
Got Sensor Sample Data

Environment (please complete the following information):

Additional context The firmware is built for XIP and the image upload command will do a full erase before writing the new firmware

Another example which is based on a commit from the main branch from around august. This is even better to reproduce the issue as it happens every time the upload command is executed;

https://github.com/zephyrproject-rtos/zephyr/commit/2179732e81421a37c9b095a64271ac2ccb873196

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 19 (18 by maintainers)

Most upvoted comments

Hi @Cladoc , If you can replicate the issue during the image update, then you can test it with the updated driver from @danieldegrasse , and see if that resolves the issue. Thanks

is there any Kconfig flag that we have missed that ensures these are always placed in RAM? Hi @Cladoc ,

I am not aware of a Kconfig to do that. But if you have a solution, you are welcome to contribute it as a Pull Request.

Best regards

Hi @DerekSnell, An additional question. We had some discussions regarding the memory placement of files concerning fault handling when performing operations like these. With the cache disabled I moved these to ITCM and occasionally managed to get some useful information on the type of fault occurring and the precise instruction addresses.

I do realize this is not strictly NXP related, but is there any Kconfig flag that we have missed that ensures these are always placed in RAM, or specifically ITCM/DTCM? If not, do you know if there are any considerations as to add such an option in the future for ease of use instead of manually relocating all concerned files? I believe the logs provided by fault handlers and the chance to grab the halted processor with a debugger are crucial in terms of reducing debugging time when an issue such as this occurs.

Best Regards

Hi @kunoh , We have identified a potential issue in the flash driver that may be causing your issue. Disabling interrupts or the cache will not prevent this issue. BTW, the flash driver does disable interrupts during the flash operations.

There is some code in the driver that reads the device structure during the flash operations. And that stuct is located in flash. So this appears to lead to a Read-While-Write situation, where the CPU attempts to read from the flash during an operation.

NXP will work to fix this driver to avoid that issue. New issue created at https://github.com/zephyrproject-rtos/zephyr/issues/64702. And since this issue is challenging to replicate, we would like your help to test the update, and see if it resolves your issue.

In the meantime, if you want to try to workaround this issue, the structs used by the driver need to be placed in RAM. You can refer to the changes in https://github.com/zephyrproject-rtos/zephyr/commit/e58d0c3bb5a7d3e48bd60697578fa43797617c8 where this was done in the HyperFlash driver.

Thank you for raising this to us.

Hi @DerekSnell ,

We are a bit in doubt if we are disabling the cache correctly. We have tried disabling CONFIG_DCACHE and CONFIG_ICACHE, but the Configuration and Control Register, CCR, still shows that I cache and D cache is enabled. By disabling these two configs we can not reproduce the issue, so something is changing. Is it the correct way to disable the caches or can you let us know how it should be done?

UPDATED We figured out how to disable both ICache and DCache using these configs; CONFIG_IMXRT1XXX_CODE_CACHE=n and CONFIG_IMXRT1XXX_DATA_CACHE=n

If we only disable ICache, we are not able to reproduce the issue. If we only disable DCache, we are not able to reproduce the issue. If we disable both, we can reproduce the issue some times - but it’s not as consistent as when they are enabled.

Regarding disabling global interrupts, we have not tried this yet and will get back to you once we know more.

Thank you.