esp-idf: [BUG] send() sometimes gets stuck ? (IDFGH-461)

As i have tested tcp client example i ran it for a couple of hours until send() got stuck and didn’t return any value

I’m calling it this way : send(sock, data_to_send, strlen(data_to_send), 0);

This bug is repreducable

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (2 by maintainers)

Most upvoted comments

yes , Enable this option, test it, thank you

hi @mikisch81 I will use your code to reproduce this problem. Thank you for your feedback

Hello, I am encountering a similar issue.

I am running a WROVER device - ESP32D0WDQ5 (revision 1), ESP-IDF official version 3.3.1.

When sending a large data buffer (> 1MB) from PSRAM every 30 seconds to a server which resides on an UBUNTU machine I get this behavior after a couple of iterations.

I also used the tcp-client example, modified for this scenario.

This is the modified tcp_client_task() function:

#define MAX_SEND_SIZE        CONFIG_TCP_SND_BUF_DEFAULT
#define PSRAM_START_ADDR     (0x3F800000)
#define LARGE_DATA_START     PSRAM_START_ADDR
#define LARGE_DATA_SIZE      (1295701)

static int send_large_data(int sock, uint8_t *buffer, size_t size)
{
    size_t remaining_size = size;
    size_t total_sent = 0;
    uint8_t *curr = buffer;

    ESP_LOGW(TAG, "Going to send total of %u bytes", size);
    while (remaining_size > 0) {
        size_t send_size = (remaining_size > MAX_SEND_SIZE) ? MAX_SEND_SIZE : remaining_size;
        int sent = send(sock, curr, send_size, 0);
        if (sent < 0) {
            ESP_LOGE(TAG, "Error occurred during sending: errno %d", errno);
            return -1;
        }
        remaining_size -= sent;
        total_sent += sent;
        ESP_LOGW(TAG, "Sent %u bytes from 0x%08x (%u/%u)", sent, (uint32_t)curr, total_sent, size);
        curr += sent;
    }
    return total_sent;
}

static void tcp_client_task(void *pvParameters)
{
    char rx_buffer[128];
    char addr_str[128];
    int addr_family;
    int ip_protocol;

    while (1) {
#ifdef CONFIG_EXAMPLE_IPV4
        struct sockaddr_in destAddr;
        destAddr.sin_addr.s_addr = inet_addr(HOST_IP_ADDR);
        destAddr.sin_family = AF_INET;
        destAddr.sin_port = htons(PORT);
        addr_family = AF_INET;
        ip_protocol = IPPROTO_IP;
        inet_ntoa_r(destAddr.sin_addr, addr_str, sizeof(addr_str) - 1);
#else // IPV6
        struct sockaddr_in6 destAddr;
        inet6_aton(HOST_IP_ADDR, &destAddr.sin6_addr);
        destAddr.sin6_family = AF_INET6;
        destAddr.sin6_port = htons(PORT);
        addr_family = AF_INET6;
        ip_protocol = IPPROTO_IPV6;
        inet6_ntoa_r(destAddr.sin6_addr, addr_str, sizeof(addr_str) - 1);
#endif

        int sock =  socket(addr_family, SOCK_STREAM, ip_protocol);
        if (sock < 0) {
            ESP_LOGE(TAG, "Unable to create socket: errno %d", errno);
            break;
        }
        ESP_LOGI(TAG, "Socket created");

        int err = connect(sock, (struct sockaddr *)&destAddr, sizeof(destAddr));
        if (err != 0) {
            ESP_LOGE(TAG, "Socket unable to connect: errno %d", errno);
            break;
        }
        ESP_LOGI(TAG, "Successfully connected");

        while (1) {
            int err = send_large_data(sock, (uint8_t *)LARGE_DATA_START, LARGE_DATA_SIZE);
            if (err < 0) {
                ESP_LOGE(TAG, "Error occured during sending: errno %d", errno);
                break;
            }
            ESP_LOGI(TAG, "Going to sleep for 30 seconds");
            vTaskDelay(30000 / portTICK_PERIOD_MS);
        }

        if (sock != -1) {
            ESP_LOGE(TAG, "Shutting down socket and restarting...");
            shutdown(sock, 0);
            close(sock);
        }
    }
    vTaskDelete(NULL);
}

I use the default configuration, with only these changes in sdkconfig.defaults:

CONFIG_WIFI_SSID=<MY_SSID>
CONFIG_WIFI_PASSWORD=<MY_PASSWORD>
CONFIG_EXAMPLE_IPV4_ADDR=<MY_SERVER_IP>

CONFIG_SPIRAM_SUPPORT=y
CONFIG_SPIRAM_BOOT_INIT=y

In Wireshark on the server-side I can see that the device sent an out-of-order packet (packet 15110) which the server responds with a duplicate ack for the needed seq.num (packet 15111), then it is stuck for 20 minutes until the retransmission actually happen (packet 22681), but by this time the operation fails with NO ROUTE TO HOST error (I guess the routing table entry has aged by this time).

stuck_test

Logs from device:

rst:0x1 (POWERON_RESET),boot:0x32 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:6788
load:0x40078000,len:13232
load:0x40080400,len:7172
entry 0x40080778
I (73) boot: Chip Revision: 1
I (73) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (39) boot: ESP-IDF v3.3.1 2nd stage bootloader
I (39) boot: compile time 18:32:44
I (39) boot: Enabling RNG early entropy source...
I (43) boot: SPI Speed      : 40MHz
I (47) boot: SPI Mode       : DIO
I (51) boot: SPI Flash Size : 2MB
I (55) boot: Partition Table:
I (59) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (74) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (81) boot:  2 factory          factory app      00 00 00010000 00100000
I (89) boot: End of partition table
I (93) boot_comm: chip revision: 1, min. application chip revision: 0
I (100) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x20a14 (133652) map
I (156) esp_image: segment 1: paddr=0x00030a3c vaddr=0x3ffb0000 size=0x030a4 ( 12452) load
I (161) esp_image: segment 2: paddr=0x00033ae8 vaddr=0x40080000 size=0x00400 (  1024) load
0x40080000: _WindowOverflow4 at /home/michaels/esp/esp-idf-v3.3.1/components/freertos/xtensa_vectors.S:1779

I (163) esp_image: segment 3: paddr=0x00033ef0 vaddr=0x40080400 size=0x0c120 ( 49440) load
I (193) esp_image: segment 4: paddr=0x00040018 vaddr=0x400d0018 size=0x70a10 (461328) map
0x400d0018: _stext at ??:?

I (355) esp_image: segment 5: paddr=0x000b0a30 vaddr=0x4008c520 size=0x08750 ( 34640) load
0x4008c520: prvCheckTasksWaitingTermination at /home/michaels/esp/esp-idf-v3.3.1/components/freertos/tasks.c:3507

I (383) boot: Loaded app from partition at offset 0x10000
I (383) boot: Disabling RNG early entropy source...
I (384) psram: This chip is ESP32-D0WD
I (390) spiram: Found 64MBit SPI RAM device
I (393) spiram: SPI RAM mode: flash 40m sram 40m
I (398) spiram: PSRAM initialized, cache is in low/high (2-core) mode.
I (405) cpu_start: Pro cpu up.
I (409) cpu_start: Application information:
I (414) cpu_start: Project name:     esp-idf
I (419) cpu_start: App version:      1
I (423) cpu_start: Compile time:     May  5 2020 18:32:42
I (429) cpu_start: ELF file SHA256:  e81b19025ddbc8a2...
I (435) cpu_start: ESP-IDF:          v3.3.1
I (440) cpu_start: Starting app cpu, entry point is 0x400813d8
0x400813d8: call_start_cpu1 at /home/michaels/esp/esp-idf-v3.3.1/components/esp32/cpu_start.c:269

I (0) cpu_start: App cpu up.
I (1331) spiram: SPI SRAM memory test OK
I (1332) heap_init: Initializing. RAM available for dynamic allocation:
I (1332) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (1338) heap_init: At 3FFB90B0 len 00026F50 (155 KiB): DRAM
I (1344) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (1351) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (1357) heap_init: At 40094C70 len 0000B390 (44 KiB): IRAM
I (1364) cpu_start: Pro cpu start user code
I (1368) spiram: Adding pool of 4096K of external SPI memory to heap allocator
I (48) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (49) spiram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (109) wifi: wifi driver task: 3ffc8c9c, prio:23, stack:3584, core=0
I (109) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (109) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (149) wifi: wifi firmware version: ac331d7
I (149) wifi: config NVS flash: enabled
I (149) wifi: config nano formating: disabled
I (149) wifi: Init dynamic tx buffer num: 32
I (149) wifi: Init data frame dynamic rx buffer num: 32
I (159) wifi: Init management frame dynamic rx buffer num: 32
I (159) wifi: Init management short buffer num: 32
I (169) wifi: Init static tx buffer num: 16
I (169) wifi: Init static rx buffer size: 1600
I (169) wifi: Init static rx buffer num: 10
I (179) wifi: Init dynamic rx buffer num: 32
I (179) example: Setting WiFi configuration SSID miki_hadar...
I (279) phy: phy_version: 4102, 2fa7a43, Jul 15 2019, 13:06:06, 0, 0
I (279) wifi: mode : sta (30:ae:a4:df:10:fc)
I (289) example: Waiting for AP connection...
I (289) example: SYSTEM_EVENT_STA_START
I (409) wifi: new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (1539) wifi: state: init -> auth (b0)
I (1559) wifi: state: auth -> assoc (0)
I (1569) wifi: state: assoc -> run (10)
I (1629) wifi: connected with miki_hadar, aid = 1, channel 1, BW20, bssid = 10:5a:f7:20:1e:84
I (1629) wifi: security type: 3, phy: bgn, rssi: -59
I (1639) wifi: pm start, type: 1

I (1699) wifi: AP's beacon interval = 102400 us, DTIM period = 1
I (2599) event: sta ip: 10.100.102.19, mask: 255.255.255.0, gw: 10.100.102.1
I (2599) example: SYSTEM_EVENT_STA_GOT_IP
I (3099) example: SYSTEM_EVENT_STA_GOT_IP6
I (3099) example: IPv6: FE80::32AE:A4FF:FEDF:10FC
I (3099) example: Connected to AP
I (3099) example: Socket created
I (3449) example: Successfully connected
W (3449) example: Going to send total of 1295701 bytes
W (3449) example: Sent 5744 bytes from 0x3f800000 (5744/1295701)
W (3529) example: Sent 5744 bytes from 0x3f801670 (11488/1295701)
W (3559) example: Sent 5744 bytes from 0x3f802ce0 (17232/1295701)
W (3609) example: Sent 5744 bytes from 0x3f804350 (22976/1295701)
...
...
W (6529) example: Sent 5744 bytes from 0x3f93a200 (1292400/1295701)
W (6539) example: Sent 3301 bytes from 0x3f93b870 (1295701/1295701)
I (6539) example: Going to sleep for 30 seconds
...
...
W (1261749) example: Sent 5744 bytes from 0x3f87ccf0 (516960/1295701)
W (1261749) example: Sent 5744 bytes from 0x3f87e360 (522704/1295701)
W (1261799) example: Sent 5744 bytes from 0x3f87f9d0 (528448/1295701)
W (1261819) example: Sent 5744 bytes from 0x3f881040 (534192/1295701)
W (1262099) example: Sent 5744 bytes from 0x3f8826b0 (539936/1295701)
W (1264899) example: Sent 5744 bytes from 0x3f883d20 (545680/1295701)
W (1295539) example: Sent 5744 bytes from 0x3f885390 (551424/1295701)
E (2446359) example: Error occurred during sending: errno 113
E (2446359) example: Error occured during sending: errno 113
E (2446359) example: Shutting down socket and restarting...