esp-idf: ESP32: Wifi/Network stack broken after beacon timeout (IDFGH-10357)

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v4.3.1

Operating System used.

macOS

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

Custom Board with ESP32DOWDQ6

Power Supply used.

External 3.3V

What is the expected behavior?

The ESP32 connects to wifi and stays connected normally until it is powered off or a call to esp_wifi_disconnect() is made.

What is the actual behavior?

Randomly our ESP32 lose Wifi connection, it happens in the filed regularly. The ESP32 operates correctly then suddenly two error messages appears :

wifi:m f null
wifi:bcn_timout,ap_probe_send_start

The first message is quite not comprehensible but the second suggests a timeout happened and that the AP did not send a beacon in the correct interval. Normally, the would actively probe the AP five times to “re-enable” the connection. However in this, it is not done and the ESP stays in this state until a reboot is performed.

All network services stop working, no more web server, mdns, udp sockets and aws connection. The network is completely done. At first I thought it could be because of a memory leak but when it happens memory is high (60K left).

The problem is that the issue is rather random and not reproducible. It seems that using the “WIFI_PS_MIN_MODEM” power saving mode and connecting to Ubiquiti appliances trigger the problem more often.

Steps to reproduce.

That’s the problem, we cannot reproduce it. We see it happening a lot and it happened yersterday on an ESP32 we were monitoring and we just got the two messages above. It seems to be related to power saving mode and to ubiquiti appliances.

Debug Logs.

No response

More Information.

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 56 (11 by maintainers)

Most upvoted comments

hi @FnxQT , there was a festival days before so sorry for the late reply
I saw captured packet and it seems to be an ethernet one, and the wireless one would be helpful 😃

And I think I have found the reason for this issue, which is a mistake for Wi-Fi power-save state machine when handling the failure of allocating nulldata. I have attacthed another lib below and this one shall be able to workaround this issue.

The m f null may still happen, which was beacause of the shortage of memory at that moment, while the blocked packets shall be relieved later since the memory would be recoverd later. If you still find that there is no traffic after m f null for a long time, please feel free to provide us with the packet and log. Thanks! esp32_76263271.zip

Hi @Espressif-liuuuu,

thank you for the followup. However as suggested by @Alvin1Zhang, it would be great to see the fix back ported to v4.4 and v4.3 since they are still under maintenance. Our products are currently using 4.4.4 and we can’t easily make a jump to v5.1…

What do you think about that ?

Hey @Espressif-liuuuu, I just returned from days off. Before going, I added the esp_get_free_internal_heap_size print where you told me to.

However, during the previous week, the problem did not trigger again. I will keep you updated.

Hello @Espressif-liuuuu,

Since Monday, the wifi:m f null appeared only once and it has not caused any problem ! 100 Wifi was still working after and our web interface too. However, I logged the internal DMA capable RAM as you said, and both the amount before and after the wifi:m f null were above 35K free.

Is there something else that could cause the error ?

I will continue monitoring the system and keep you updated. I’m out of office for a week though.

Yes thats do weird. Could please add a check at the line below when the heap_caps_malloc return null, and log the internal memory by esp_get_free_internal_heap_size? Lets make sure if the memory is sufficient at the moment.

https://github.com/espressif/esp-idf/blob/8b94183c9cb47ede8f02df5598d8b9d68c754860/components/esp_wifi/esp32/esp_adapter.c#L544C4-L544C4

Hi @FnxQT , Sorry for late reply! This one is a little bit difficult to debug, so lets make sure two things at first on v4.4.4:

  1. the m f null happened
  2. there was no log shown like beacon timeout or wifi disconnect, but it failed to see STA in AP`s list

And from the log you provided, it showed that the hardware was still working, meanwhile, only a little traffic seen. We may need a bit more information for it:

  1. A piece of Wi-Fi packet captured by sniffer would be great, please set your AP to OPEN so we could easy to debug
  2. A quick way for workaround on this issue is turning off the Wi-Fi power-save, you could try esp_set_wifi_ps(0)
  3. Is it easy to reproduce the issue? It would be appreciated to provide us with a reproducible project

hi @FnxQT We are analyzing, and if there is any progress, we will inform you as soon as possible.