esp-idf: ESP32 STA deauth frame causing reconnect issues (IDFGH-6544)

Environment

  • Development Kit: Custom
  • Kit version NA
  • Module or chip used: ESP32-WROOM-32D

I am not yet up and running with an IDF example, as this was discovered downstream in arduino. I finally had time to look into this long time issue and am trying to identify/resolve the 2-3 issues this is encompassing. https://github.com/espressif/arduino-esp32/issues/2501

Problem Description

The Frame in question

This change was made sometime in late 3.x SDK I think. I am assuming this is maybe from a CVE patch , but I cannot find the exact documentation for this, maybe someone can tip me off, most of the dos/injections are against enterprise WPA2, not WPA2 PSK shrug… I am not wifi expert.

I am hoping PMF/WPA3 solves this… I have not been able to confirm that yet.

The immediate issue is that this causes some APs to (within a timeout period of last assoc, so device resets mostly!) respond to these sta deauths with a deauth reasoncode 2 or some other bizarro responses, i have seen 4way fails etc.

Screen Shot 2022-01-04 at 10 05 23 AM Screen Shot 2022-01-04 at 10 05 37 AM

The router response

This is my unifi nanohd ap sending a deauth back

Nov 20 22:24:01 nanohd f492bf7f354a,UAP-nanoHD-5.60.9+12980: stahtd: stahtd[2269]: [STA-TRACKER].stahtd_dump_event(): {"message_type":"STA_ASSOC_TRACKER","assoc_delta":"0","wpa_auth_failures":"1","mac":"7c:df:a1:00:77:ec","vap":"ra0","event_type":"failure","assoc_status":"0","event_id":"12","auth_delta":"0","auth_ts":"2360096.826707"}

Nov 20 22:49:10 nanohd f492bf7f354a,UAP-nanoHD-5.60.9+12980: hostapd: ra0: STA 7c:df:a1:00:77:ec IEEE 802.11: disassociated
Nov 20 22:49:10 nanohd f492bf7f354a,UAP-nanoHD-5.60.9+12980: kernel: ra0: AUTH - receive DE-AUTH(seq-3796) from 7c:df:a1:00:77:ec, reason=3
Nov 20 22:49:10 nanohd f492bf7f354a,UAP-nanoHD-5.60.9+12980: kernel: ra0: (MlmeDeAuthAction)Send DEAUTH frame with ReasonCode(2) to 7c:df:a1:00:77:ec
Nov 20 22:49:10 nanohd f492bf7f354a,UAP-nanoHD-5.60.9+12980: : wevent[2268]: wevent.ubnt_custom_event(): EVENT_STA_LEAVE ra0: 7c:df:a1:00:77:ec / 9
Nov 20 22:49:10 nanohd f492bf7f354a,UAP-nanoHD-5.60.9+12980: hostapd: ra0: STA 7c:df:a1:00:77:ec IEEE 802.11: disassociated
143319428-0fe1da18-c1fd-4659-9a49-8541fbc9451d
[  1479][V][WiFiGeneric.cpp:96] set_esp_interface_ip(): Configuring Station static IP: 0.0.0.0, MASK: 0.0.0.0, GW: 0.0.0.0
[WIFI] Connecting to wifi... [20000 ms]

.....................E (7394) wifi:Association refused temporarily, comeback time 1048 mSec
.....................[  5611][V][WiFiGeneric.cpp:289] _arduino_event_cb(): STA Disconnected: SSID: ssid, BSSID: f4:92:bf:7f:35:4b, Reason: 203
[  5612][D][WiFiGeneric.cpp:831] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[  5619][W][WiFiGeneric.cpp:852] _eventCallback(): Reason: 203 - ASSOC_FAIL

Now there are 2 things going on here.

  1. Why is this behavior happening? De-auth protection mechanism on both sides maybe? Can we turn it off, should it be turned off for some 80211 protos? or something ? Is it a security mechanism and thats it, we are stuck with it?

1.b Even with a workaround and reconnect, we are looking at connection times of 8-16 seconds over 1.6s in previous SDKs. Making bssid, and channel caching an absolute requirement now and YET STILL having double the connect times of 3-5s. Battery power devices be damned… lol

  1. Missing Event reason/codes to properly identify these in arduino and defer a reconnect to handle this independently from autoreconnect maybe? As of now this usually fails with no useful reason other than auth fail which is confusing. Again still looking at 8s reconnects.

refs# https://github.com/espressif/arduino-esp32/issues/2501#issuecomment-977947879

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 60 (10 by maintainers)

Most upvoted comments

@tablatronix No ,That is not the patch for the above mentioned issue.

It works with STA PMF off

ESP-IDF v4.4-172-g730ca0ea43-dirty_PATCH3_pmfcapableFALSE.pcapng.zip

Only issue I can think of is that the IDF says that PMF cannot be disabled on the S2. So I am assuming that this WILL work with PMF on and PMF required on ap, and wpa3 ?

I can test all these things this week.

Test PMF

  • AP required / STA Optional
  • AP disabled / STA Optional
  • AP WP3

Hi @tablatronix,Thanks for the capture and the initial issue of AP not responding to Assoc req is solved. Capture before patch - image

From the above capture ,we see that the AP sends deauth after auth and does not respond to assoc req. after the patch , image

we see that the AP responds with the Assoc resp ,with a come back time of 1 sec.This is expected as PMF is enabled and the STA info is not cleared from the AP. But AP is not sending any SA query packets.
And the pkt 16407 is an Ack to the STA but the pkt before is not captured.Which could be the Assoc resp. To confirm this can you take the sniffer capture again. Also can you disable the PMF on the STA and see if it can connect.Once the PMF is disabled AP will accept the connection the second time without delay.

not sure how to check that the patch is really in there, but it built fine after a fullclean after replacing files so I am assuming it did

With v4.4 branch(v4.4-172-g730ca0ea43) + patch_for_reconnect_issue_4.4_esp32.zip, you should observe “-dirty” in wifi firmware version. I (1094) wifi:wifi firmware version: 7679c42-dirty

Withn a clean v4.4 branch tree: I (1093) wifi:wifi firmware version: 7679c42

Hi @tablatronix , pls clone the idf 4.4 , “release/v4.4” and checkout to tag "730ca0ea43 ". replace the files in this location “/home/esppool/idf/4.4/esp-idf/components/esp_wifi/lib/esp32” with the attached files. now you will be able to build. Pls take the sniffer capture with the patch. patch_for_reconnect_issue_4.4_esp32.zip

Well here is me hardware reseting every 10 seconds, its clear that every second boot fails with auth fail. obviously AP state is a player, if sta was connected and reconnects it always fails with an auth fail.

Hope it helps figure something out… Not sure what beacons have to do with it as it is not intermittent or random, its the same over and over, but it didn’t do this on older IDFs, I will roll back and capture those too, hence why I thought the teardown was changing this behavior now.

every_other_hardware_reset.pcapng.zip

Hi @tablatronix , can we have the complete capture to understand the issue better?