esp-idf: Unexpectedly high heap allocations during HTTPS OTA (IDFGH-3276)

Environment

  • Module or chip used: bare WROOM-32D
  • IDF version: v4.2-dev-1320-g1aebfdf6a
  • Using Espressif VS Code extension, compiler esp-2020r1-8.2.0

Problem Description

While testing HTTPS OTA, I checked its impact on heap memory in my program by dumping the free/minimum memory before and after, by using esp_get_free_heap_size() and esp_get_minimum_free_heap_size(). I noticed that the minimum had gone from 115K beforehand, to <10K after. Needless to say, this 100K on the heap far exceeds the overhead I expected.

I have reproduced it somewhat with simple_ota_example. The allocations seem to increase with the time taken to erase the partition, so I contrived this partition table with a very large ota_0 to exaggerate the effect:

# Name,   Type, SubType, Offset,    Size,   Flags
phy_init, data, phy,     0xC000,    0x1000,
factory,  app,  factory, 0x10000,   0xEC000,
nvs,      data, nvs,     0xFC000,   0x2000,
otadata,  data, ota,     0xFE000,   0x2000,
ota_0,    app,  ota_0,   0x100000,  0x2F0000,
ota_1,    app,  ota_1,   0x3F0000,  0x10000,

After inserting some logging, eg: ESP_LOGW(TAG, "Starting: %u, min: %u", esp_get_free_heap_size(), esp_get_minimum_free_heap_size());

Here’s the relevant output:

I (4206) simple_ota_example: Starting OTA example W (4206) simple_ota_example: Starting: 231256, min: 222080 W (4216) HTTP_CLIENT: Before http connect: 229288, min: 222080 W (5486) HTTP_CLIENT: After http connect: 196928, min: 187696 I (5516) esp_https_ota: Starting OTA… I (5526) esp_https_ota: Writing to partition subtype 16 at offset 0x100000 W (5526) esp_https_ota: Before esp_ota_begin: 196596, min: 187696 W (5526) esp_ota_ops: Before esp_partition_erase_range: 191328, Min: 187696 E (10496) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (10546) task_wdt: - IDLE0 (CPU 0) E (10546) task_wdt: Tasks currently running: E (10546) task_wdt: CPU 0: ipc0 E (10546) task_wdt: CPU 1: IDLE1 W (12886) esp_ota_ops: After esp_partition_erase_range: 153004, Min: 114116 W (12886) esp_https_ota: After esp_ota_begin: 159756, min: 114116

The WDT is to be expected, and there’s the expected allocations for mbedTLS and the HTTP client (~40K). However, note that the minimum reached just 114K at some point, indicating a mystery further 70K total was allocated.

For what it’s worth, here is similar output from my more complex program where memory was down to just 1624 bytes at some stage during esp_ota_begin:

W (23344) MY-OTA: Free: 115728, Min: 105352 W (23344) HTTP_CLIENT: Before http connect: 115728, min: 105352 W (23854) HTTP_CLIENT: After http connect: 71104, min: 61844 I (23974) MY-OTA: Starting OTA… W (23974) MY-OTA: Free: 62456, Min: 61844 I (23974) MY-OTA: Writing to partition subtype 16 at offset 0x180000 W (23974) esp_ota_ops: Before esp_partition_erase_range: 62456, Min: 61844 W (27244) esp_ota_ops: After esp_partition_erase_range: 12472, Min: 12472 I (27244) MY-OTA: esp_ota_begin succeeded I (27244) MY-OTA: Please Wait. This may take time W (27254) MY-OTA: Free: 60172, Min: 1624

My concern is that OTA downloads can take quite some time, and I have other concurrent tasks - in this case, WiFi is in APSTA mode with a httpd server up - so I am not at all confident of stability if OTA is sporadically chewing up 100K+ of memory.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (14 by maintainers)

Most upvoted comments

@boarchuz

I believe this is what you were suggesting but I may have misunderstood. With my poor config, that was a great improvement.

Yes (that’s what I was suggesting).

Considering that WiFi is of course very active throughout the download, it’s natural for the heap to be fluctuating for some time, so with a decent configuration there isn’t as much to gain by reordering (with my current setup,

Agree and confirm observations on my side as well are similar to yours. So we are good with respect to https_ota state machine and there is no change in ordering is required here. If there are no followup questions from your side then we can close this issue. (IMO, PR https://github.com/espressif/esp-idf/pull/5246 is still good to have, but that I will follow separately).

Thanks.

@boarchuz

Re-opening issue, couple of things would like to check here:

  • Just wondering if esp_ota_begin is correctly placed in https ota state machine, we will check on this. If we erase partition before beginning TLS connection, it may reduce memory pressure. Downside being, erase may happen even if upgrade was not desired (since upgrade version is part of TLS stream itself).

  • PR https://github.com/espressif/esp-idf/pull/5246 can also be helpful here.

We will keep this issue updated post some investigations on our side.

Thanks.