esp-idf: System time corrupted across reboots (IDFGH-7930)

Environment

  • Module or chip used: ESP32 on TTGO, but happens on other devices as well
  • IDF version: 4.4.1 release
  • Build System: Standard Windows VSCode installation via the tutorial
  • Power Supply: USB on the TTGO, but other devices with their own supply behave the same

Problem Description

After leaving it running for a while (1-3 days) if the device is soft-reset using esp_restart() the system time is wrong. Sometimes only by seconds, sometimes by hours, in both directions. SNTP will correct this later, however during the initial phase system time cannot be relied on. The phenomenon gets worse the longer the device is on (= the more time has passed since the last hard-reset).

Interestingly, esp_rtc_get_time_us() which indicates the time since the last hard-reset also jumps the same amount.

There is a four year old forum thread indicating the same problem: https://www.esp32.com/viewtopic.php?t=7544

Expected Behavior

System time running continuously across reboots.

Code to reproduce this issue

Make sure to switch on system time stamps for logging!

Leave the example running for 1 day or more and from time to time look at the output. I used PuTTY since the IDF monitor causes a hard-reset on reconnect. An automated reboot will happen once per hour. Compare the time stamps right before and after the restarts.

#include "esp_log.h"
#include "esp_sntp.h"
#include "esp_wifi.h"
#include "nvs_flash.h"
#include "esp32/rtc.h"
 
static const char *TAG = "main";
 
void time_sync_notification_cb(struct timeval *tv) {
    ESP_LOGI(TAG, "time_sync_notification_cb");
}
 
void log_rtc(void) {
    int u = esp_rtc_get_time_us() / 1000000;
    ESP_LOGI(TAG, "rtc=%d seconds (%d days %02d:%02d:%02d)", u, u / (24*3600), (u % (24*3600)) / 3600, (u % 3600) / 60, u % 60);
}
 
void app_main(void) {
    log_rtc();
    ESP_LOGI(TAG, "start");
 
    ESP_ERROR_CHECK(nvs_flash_init());
    ESP_ERROR_CHECK(esp_netif_init());
    ESP_ERROR_CHECK(esp_event_loop_create_default());
    esp_netif_create_default_wifi_sta();
    wifi_init_config_t wifi_init_config = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK(esp_wifi_init(&wifi_init_config));
 
    wifi_config_t conf_sta = {
        .sta.ssid = "xxxxx",
    };
    esp_wifi_set_config(WIFI_IF_STA, &conf_sta);
 
    ESP_ERROR_CHECK(esp_wifi_start());
    esp_wifi_connect();
 
    sntp_setoperatingmode(SNTP_OPMODE_POLL);
    sntp_setservername(0, "pool.ntp.org");
    sntp_set_time_sync_notification_cb(time_sync_notification_cb);
    sntp_init();
 
    // Wait an hour
    for (int i = 0; i < 60; i++) {
        ESP_LOGI(TAG, "%d", i);
        vTaskDelay(60000 / portTICK_PERIOD_MS);
    }
 
    ESP_LOGI(TAG, "end");
    log_rtc();
 
    esp_restart();
//  esp_abort(); // TODO: Try the other "restart methods" and see if there is a difference
//  esp_deep_sleep(1000000);
}

Debug Logs

I (08:07:52.302) main: 58
I (08:08:52.302) main: 59
I (08:09:52.302) main: end
I (08:09:52.303) main: rtc=32399 seconds (0 days 08:59:59)
[...]
ets Jun 8 2016 00:22:57
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[...]
I (08:09:15.296) main: rtc=32362 seconds (0 days 08:59:22)
I (08:09:15.301) main: start

Restarting takes maybe one second or so, but as you can see in this case time jumped backwards more than half a minute. Which should be impossible.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 37 (12 by maintainers)

Commits related to this issue

Most upvoted comments

Looks very good. No problems so far.

Hi! v5.0 is not merged yet but it is ready to go. For the rest (4.4, 4.3) branches, the backport will not be provided according to backport policy.

Hi @kriegste! Thank you for the test example. I will check it and fix this issue.

For the rest (4.4, 4.3) branches, the backport will not be provided according to backport policy.

This is a BUGFIX. no idea why your policy does not support bugfix in maintenance period. I use esp-idf for several years, my experience is the releases in service period are not really stable. Once a release reach more stable state it’s almost in maintenance period, however it’s difficult to get bug fix for releases in maintenance period.

BTW, the regression was reported on Feb 28 (https://github.com/espressif/esp-idf/issues/9448#issuecomment-1448253787) , (I have been waiting the fix for v4.3 for 3 months https://github.com/espressif/esp-idf/issues/9448#issuecomment-1501384081)

@kriegste @boribosnjak, Sorry for the flaw, yes, it is possible. I will try to find a way to fix it. I reopen this issue.

Hi @AxelLin! Sorry for the delay. It was on review for a long time, now it is merged and soon will appear here (backports as well).

Hello @kriegste and @KonstantinKondrashov ! Thanks for your work on this. I hope it’s ok to resurrect this in the initial issue – I wanted to alert you to what I believe may be an issue regarding this change in the context of OTA firmware updates.

The issue we’re seeing is that changing s_esp_rtc_time_us and s_rtc_last_ticks to be RTC_NOINIT_ATTR and initializing them in esp_rtc_get_time_us() only works if the slow clock calibration register is zero. That is fine on power-up, but I believe that after an OTA update that register will still be set from before the update. Consequently, s_esp_rtc_time_us and s_rtc_last_ticks will not be initialized, and if variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup.

I think we saw this on our firmware, and it resulted in incorrect values being reported by esp_rtc_get_time_us(). I’d be curious if you agree with this observation or if I’m off-base somehow.

Ideally during an OTA our system clock could stay valid, but at the least I’d like to avoid the undefined behavior we’re currently experiencing.

We see exactly the same problem. @kriegste and @KonstantinKondrashov from our point of view the change fixes the issue discussed in this thread. Unfortunately it created an issue after a firmware update, when addresses of the variables have changed.

Hello @kriegste and @KonstantinKondrashov ! Thanks for your work on this. I hope it’s ok to resurrect this in the initial issue – I wanted to alert you to what I believe may be an issue regarding this change in the context of OTA firmware updates.

The issue we’re seeing is that changing s_esp_rtc_time_us and s_rtc_last_ticks to be RTC_NOINIT_ATTR and initializing them in esp_rtc_get_time_us() only works if the slow clock calibration register is zero. That is fine on power-up, but I believe that after an OTA update that register will still be set from before the update. Consequently, s_esp_rtc_time_us and s_rtc_last_ticks will not be initialized, and if variables in RTC RAM have been moved around by the linker, they will be filled with garbage data. Any reset other than OTA would work fine because the variables would still be initialized from the initial bootup.

I think we saw this on our firmware, and it resulted in incorrect values being reported by esp_rtc_get_time_us(). I’d be curious if you agree with this observation or if I’m off-base somehow.

Ideally during an OTA our system clock could stay valid, but at the least I’d like to avoid the undefined behavior we’re currently experiencing.

Hi @someburner! Sure. The v5.0 backport is ready and will be merged soon. Thanks.

The backports are ready and will be merged soon.

Finally, I got the log that shows this issue.

1) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=3638 seconds (0 days 01:00:38)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=3639 seconds (0 days 01:00:39) ------------------------------------   = +1sec
I (336) main: start
I (336) main: 0

2) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=7240 seconds (0 days 02:00:40)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=7199 seconds (0 days 01:59:59) ------------------------------------   = -41sec
I (336) main: start
I (336) main: 0

3) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=10802 seconds (0 days 03:00:02)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=10804 seconds (0 days 03:00:04) ------------------------------------   = +2sec
I (336) main: start
I (336) main: 0

4) boot
I (3540336) main: 59
I (3600336) main: end
I (3600336) main: rtc=14405 seconds (0 days 04:00:05)
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
I (326) main: rtc=14395 seconds (0 days 03:59:55) ------------------------------------   = -10sec
I (336) main: start
I (336) main: 0

Hi! Sorry for the delay, It might be linked to this issue https://github.com/espressif/esp-idf/issues/9455 as well. I will revisit the whole timekeeping functionality, as I remember, I saw something weird. I will provide a patch here before merging it.

Setting the config option CONFIG_ESP32_RTC_CLK_CAL_CYCLES to 0 to disable calibration causes a big time jump in the next restart, but after that, the problem is gone! So this confirms that how the calibration is done is the root cause.