cognitive-services-speech-sdk: Python SDK Doesn't Close Resources, Causes WS_ERROR_UNDERLYING_IO_ERROR

Describe the bug Python Implementation of the SDK does not close resources. Open file handlers and TCP connections will grow unbounded unless the parent process of the SDK is killed.

When resources are left open and growing, WebSocket operation failed. Internal error: 3. Error details: WS_ERROR_UNDERLYING_IO_ERROR comes out of the SDK at an alarming rate (sometimes more than 50% of requests will spit out the error)

To Reproduce

Steps to reproduce the behavior:

  1. Implement a simple gunicorn server that will call the speech to text SDK (like this example)
  2. Access your server, which should trigger speech to text
  3. Check lsof and netstat and you will see file handlers grow with every request

LSOF will show the following two files open by the gunicorn worker process indefinitely. Additional file handlers to the same two files will be added with every request, while previous ones will not close.

python3.7/site-packages/azure/cognitiveservices/speech/libMicrosoft.CognitiveServices.Speech.core.so
python3.7/site-packages/azure/cognitiveservices/speech/libMicrosoft.CognitiveServices.Speech.extension.kws.so

In addition to the underlying IO error from the sdk, this will eventually lead to a too many open files system error if the sdk is being used in a persistent API.

And netstat will show dangling TCP connections indefinitely

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 10.200.11.137:47238     52.184.80.197:443       ESTABLISHED 30945/python3.7
tcp        0      0 10.200.11.137:47240     52.184.80.197:443       ESTABLISHED 30946/python3.7
tcp        0      0 10.200.11.137:47246     52.184.80.197:443       ESTABLISHED 30945/python3.7
  1. Restart the server -> This will close all of the TCP connections and open file handlers.
  2. Repeated requests without restarting the server will lead to WebSocket operation failed. Internal error: 3. Error details: WS_ERROR_UNDERLYING_IO_ERROR coming out of the SDK very frequently.

Expected behavior One of the following:

  • Python’s SpeechRecognizer class should implement a close method to clean up resources.
  • stop_continuous_recognition should clean up resources.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech==1.6.0 from Pip

Platform, Operating System, and Programming Language

  • OS: Debian Streth, Amazon Linux 2, Ubuntu 19.04
  • Hardware - x64
model name      : Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
stepping        : 4
cpu MHz         : 2500.000
cache size      : 33792 KB
  • Programming language: Python 3.7

Additional context

  • All of my requests to the sdk are using the same audio file
  • If I restart the parent worker, TCP connections and open file handlers are closed
    • Additionally, with this restart the underlying IO error stops happening
  • I have reproduced this issue on several different AWS EC2 instances, as well as on locally running Docker machines so I don’t think it’s a system-level network issue.
  • I have tried the following settings in sysctl to fix the issue from a system level to no avail
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

TCPDump for sdk connections that end with WS_ERROR_UNDERELYING_IO_ERROR ends with the following:

    52.184.80.197.https > ip-10-200-11-29.49110: Flags [F.], cksum 0xda84 (correct), seq 5031, ack 195333, win 1517, options [nop,nop,TS val 1505324782 ecr 3200938], length 0
05:07:45.758722 IP (tos 0x0, ttl 64, id 61588, offset 0, flags [DF], proto TCP (6), length 1438)
    ip-10-200-11-29.49110 > 52.184.80.197.https: Flags [.], cksum 0xa0f2 (incorrect -> 0xc34a), seq 197445:198831, ack 5032, win 343, options [nop,nop,TS val 3200950 ecr 1505324782], length 1386
05:07:45.806866 IP (tos 0x0, ttl 41, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    52.184.80.197.https > ip-10-200-11-29.49110: Flags [R], cksum 0x52c7 (correct), seq 189854933, win 0, length 0

It looks like Azure is sending a stop signal (Flags [F] and Flags [R]), but the SDK is continuing to send data anyway.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

@Checkroth: The SDK update has been released. I’m closing this for now, please reopen/create a new issue if there are problems.

The bug in the current version can lead to packets being dropped or sent out of order in cases of high network load. This breaks decryption on the server, which then aborts the connection. Slowing down the input helps to reduce the network load; in absence of this error the SDK can accept input data at any rate. The SDK buffers it internally and throttles to a speed the service expects.

The fix will address the network problem and thus make the throttling of the stream unnecessary; the error should be gone after the update.