esp-idf: httpclient has unexpected and unhadled error that causes the call to not return during esp_https_ota (IDFGH-4543)
Environment
- Development Kit: [none]
- Module or chip used: [ESP32-WROOM-32D]
- IDF version (run
git describe --tagsto find it): v4.2-47-g2532ddd9f - Build System: [idf.py]
- Compiler version (run
xtensa-esp32-elf-gcc --versionto find it): xtensa-esp32-elf-gcc (crosstool-NG esp-2020r3) 8.4.0 - Operating System: [Linux]
- Using an IDE?: [No]
- Power Supply: [external 3.3V]
Problem Description
Sometimes when I call esp_err_t ret = esp_https_ota(&config); the function never returns. It prints a few log messages referencing EAGAIN. There is no more OTA relevant information in the log.
Expected Behavior
esp_https_otashould not block for 10+ hours- As a user of
esp_https_otai expect the library to handle any errors or return a suitableesp_err_t - It looks like
EAGAINis related to the non-blocking socket API, butesp_https_otais blocking (as noted here). Thus it seemsEAGAINshould not be a possibleerrnoeven in the internal implementation ofesp_https_ota
Actual Behavior
esp_http_client_config_t config = {
.url = image_location,
.cert_pem = (char *)server_cert_pem_start,
.event_handler = _http_event_handler,
.timeout_ms = 20000,
};
esp_err_t ret = esp_https_ota(&config);
ESP_LOGI(TAG, "sometimes the above call never returns");
When esp_https_ota does not return, I get log output that looks like this:
D (08:01:36.759) esp_https_ota: Written image length 1104558
D (08:01:36.766) esp_https_ota: Written image length 1104847
D (08:01:36.772) esp_https_ota: Written image length 1105136
D (08:01:36.779) esp_https_ota: Written image length 1105425
D (08:01:36.785) esp_https_ota: Written image length 1105714
E (08:01:56.782) TRANS_SSL: esp_tls_conn_read error, errno=No more processes
W (08:01:56.783) HTTP_CLIENT: esp_transport_read returned:-26880 and errno:11
D (08:01:56.792) esp_https_ota: Written image length 1105920
No more OTA logs are printed. The MQTT client on the system keeps working.
The issue only occurs on the office Wi-Fi network. When using an alternative wifi network, that is used for development only, the issue went away.
Steps to reproduce
I have not found a good way to reproduce this issue. As it is dependent on the wifi network used, I expect that this is tricky to reproduce. When testing in my environment I ran the code above on startup, patched esp_https_ota to do esp_restart before esp_https_ota_finish. This results in a system that downloads the ota image, then reboots. Eventually, it will hang in esp_https_ota as described above.
Other items
- It seems that someone in the forum has a similar issue: https://www.esp32.com/viewtopic.php?t=17732
- Issue #4394 appears to be very related, with the same errno in the log, and it has a commit that fixed the issue 2 yeas ago
It may be that the issue is a lack of error handling in esp_https_ota. Alternatively, there is an error in the HTTP client that should not occur.
Please let me know if there is any additional information I can provide.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 33 (13 by maintainers)
@softdel1003 I don’t think the OTA fail, you can check line 11076. OTA still works.
@arntdr This has been merged internally, it will likely appear on github with next codebase sync event (automated process but it should happen in next few days). For release, this will be part of v4.3 release, expected (roughly) towards end of Feb timeline.
keep_idle time is nothing to do with connection timeout. keep_idle will be checked when pcb is in ESTABLISHED or CLOSE_WAIT state.