homie-esp8266: [develop-v3] OTA failing
Hi people,
I’ve been hitting my head against the wall trying to OTA update an ESP8266. The code is minimalistic
#include "Arduino.h"
#include <Homie.h>
HomieNode sensorNode("sensor", "Sensor", "sensor");
unsigned long lastSent = 0;
void loopHandler() {
unsigned long m = millis();
if ( m - lastSent >= 10 * 1000UL || lastSent == 0) {
Homie.getLogger() << "Fake value: " << m << endl;
sensorNode.setProperty("value").send(String(m));
lastSent = millis();
}
}
void setup() {
Serial.begin(115200);
Homie.getLogger() << "kkkkkkkkkkkkkkkkkkkkkk" << endl;
Homie.getMqttClient().setKeepAlive(120);
Homie_setFirmware("test", "1.0.0");
Homie.setLoopFunction(loopHandler);
sensorNode.advertise("value").setName("Value").setDatatype("double");
Homie.setup();
}
void loop(){
Homie.loop();
}
The platformio.ini file is :
[env:nodemcuv2]
platform = espressif8266
board = nodemcuv2
framework = arduino
build_flags =
-D HOMIE_MDNS=0
-D PIO_FRAMEWORK_ARDUINO_LWIP2_LOW_MEMORY
upload_speed = 115200
lib_ldf_mode = deep
lib_deps =
ArduinoJson@6.13.0
https://github.com/homieiot/homie-esp8266.git#develop-v3
I also repartitioned the mcu to make sure there is enough space:
# Name, Type, SubType, Offset, Size, Flags
nvs, data, nvs, 0x9000, 0x6000,
phy_init, data, phy, 0xf000, 0x1000,
factory, app, factory, 0x10000, 3M,
Here are my libs :
Dependency Graph
|-- <ArduinoJson> 6.13.0
|-- <Homie> 3.0.0 #7818c73
| |-- <ArduinoJson> 6.13.0
| |-- <AsyncMqttClient> 0.8.2
| | |-- <ESPAsyncTCP> 1.2.2
| | | |-- <ESP8266WiFi> 1.0
| |-- <Bounce2> 2.52
| |-- <ESP Async WebServer> 1.2.3
| | |-- <ESPAsyncTCP> 1.2.2
| | | |-- <ESP8266WiFi> 1.0
| | |-- <Hash> 1.0
| | |-- <ESP8266WiFi> 1.0
| | |-- <ArduinoJson> 6.13.0
| |-- <ESP8266mDNS> 1.2
| | |-- <ESP8266WiFi> 1.0
| |-- <ESP8266WiFi> 1.0
| |-- <DNSServer> 1.1.1
| | |-- <ESP8266WiFi> 1.0
| |-- <ESPAsyncTCP> 1.2.2
| | |-- <ESP8266WiFi> 1.0
| |-- <Ticker> 1.0
| |-- <ESP8266HTTPClient> 1.2
| | |-- <ESP8266WiFi> 1.0
Also for some reason, not sure why, platform IO is not recognizing the size of my partitions …
DATA: [===== ] 48.9% (used 40048 bytes from 81920 bytes)
PROGRAM: [===== ] 49.8% (used 520632 bytes from 1044464 bytes)
Shouldn’t it read 3M and not 1MB (1044464 bytes)
Anyways, then what I do is modify the sketch to change the logging line after initializing Serial. Then build and then invoke :
$ python ota_updater.py -l 192.168.0.227 -p 1883 -t devices/ -i testbox /home/nemi/workspace/green-jac/homie_i2c/.pio/build/nodemcuv2/firmware.bin
Connecting to mqtt broker 192.168.0.227 on port 1883
Connected with result code 0
Waiting for device to come online...
Waiting for device info...
Publishing new firmware with checksum 3373df653b869cf431689d254d3b1e37
[ ] 989/545344Waiting for device info...
[ ] 989/545344Expecting checksum 3373df653b869cf431689d254d3b1e37, got b6b11dc49ab3c6400b1995025dc996f1, update failed!
Sometimes the serial output shows and blocks:
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
✖ OTA failed (500 INTERNAL_ERROR 11)
Triggering OTA_FAILED event...
ERROR 11 according to this means UPDATE_ERROR_BAD_ARGUMENT
Some other times I get :
Receiving OTA payload
↕ OTA started
Triggering OTA_STARTED event...
Firmware is binary
Receiving OTA firmware (453/545344)...
Receiving OTA firmware (989/545344)...
ets Jan 8 2013,rst cause:4, boot mode:(3,6)
wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v951aeffa
~ld
Here is what gets published :
------- SOMETHING THAT SEEMS TO BE THE BINARY BEING SENT OVER MQTT --------
devices/testbox/$implementation/ota/status 202
devices/testbox/$implementation/ota/status 206 453/545344
devices/testbox/$implementation/ota/status 206 989/545344
------- RESET -------
devices/testbox/$state init
devices/testbox/$homie 3.0.1
devices/testbox/$name Test
....
As u see in the output the board restarts but with the old firmware, it’s like it starts getting the binary but then decides (?) to restart before it got it all. I read through this other issue https://github.com/homieiot/homie-esp8266/issues/613 and tried the keep alive thing but did not work either
Then I also added the flag but it makes it even worse
-D HOMIE_CONFIG=0
I get the following crash if I flash the board with a firmware built with that flag and then OTA:
0x40225144: operator delete[](void*) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/del_opv.cc line 33
0x40225174: operator new(unsigned int) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/new_op.cc line 45
0x40225135: std::set_terminate(void (*)()) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/eh_terminate.cc line 70
0x40225120: __fpclassifyd at ../../../../../../dl/newlib-xtensa/newlib/libm/common/s_fpclassify.c line 26
0x402252b1: glue2esp_linkoutput at glue-esp/lwip-esp.c line 239
0x40225533: new_linkoutput at glue-lwip/lwip-git.c line 216
0x40225174: operator new(unsigned int) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/new_op.cc line 45
0x4022ce14: etharp_output_to_arp_index at core/ipv4/etharp.c line 764
0x4022e6da: inet_chksum_pseudo at core/inet_chksum.c line 266
0x4022dba2: ip4_output_if_opt at core/ipv4/ip4.c line 820
0x4022e717: inet_chksum_pseudo at core/inet_chksum.c line 327
0x4022a8a2: tcp_output at core/tcp_out.c line 1312
0x40228ba4: tcp_input at core/tcp_in.c line 487
0x4022d919: ip4_input at core/ipv4/ip4.c line 696
0x4022589d: ethernet_input_LWIP2 at netif/ethernet.c line 170
0x402256d3: esp2glue_alloc_for_recv at glue-lwip/lwip-git.c line 419
0x4024a51e: ethernet_input at glue-esp/lwip-esp.c line 353
0x4024a52f: ethernet_input at glue-esp/lwip-esp.c line 355
0x40225135: std::set_terminate(void (*)()) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/eh_terminate.cc line 70
0x40225120: __fpclassifyd at ../../../../../../dl/newlib-xtensa/newlib/libm/common/s_fpclassify.c line 26
0x40225135: std::set_terminate(void (*)()) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/eh_terminate.cc line 70
0x40225174: operator new(unsigned int) at ../../../../../dl/gcc-xtensa/libstdc++-v3/libsupc++/new_op.cc line 45
Any ideas what might be wrong or what else to try before diving into the code? I still have to try with ESP32 to see if the same thing happens. Thanks!
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 27 (17 by maintainers)
Hey guys, I found the problem and learned a few things along the way.
The hardware WD timer was being hit as direct consequence of a memory bug. I timed the memcpy + unique_ptr piece of code to be aprox at 50 microsecs.
Here is what’s wrong : When you get to ota topic that has as payload the binary firmware it first goes through
The bug resides in that the tokens are pieces of the internal buffer copied into _mqttTopicCopy. When doing OTA, the _onMqttMessage is “artificially” invoked for the chunked FW (here), every time a new call to _onMqttMessage is done kills the previous instance of the _mqttTopicCopy (here) and thus the pointers that were stored on _mqttTopicLevels become invalid, leading to undefined behavior and eventually a timeout.
I’ll work on a fix and send the PR.
Thanks !
@nemidiy your support is greatly appreciated!
@luebbe @sradner13 thanks for the input. the memcpy code is copying the mqtt topic, not the payload. https://github.com/homieiot/homie-esp8266/blob/develop-v3/src/Homie/Boot/BootNormal.cpp#L786 The topic should be string no more that 50 chars. Anyways, since I am the rookie here I will time that piece of code to see how much time it takes. Regardless there is clearly something wrong on the copying that’s causing the issue. I will do some debugging and come back. My end goal is having OTA over SSL working eventually.