python-ibmdb: ibm_db connect() blocks main thread when server not responding until timeout
Hi, while working on SlashDB, we faced an issue that the .connect() function in the thread blocks all the threads while it trying to connect. When a server is down, or IP is wrong the connection will block all threads until it times out. So the whole web server hangs. To reproduce firewall must drop the packages, so it will timeout, not just quick reject.
Seems that only .connect() is affected. The .execute(), even long, does not block. Please see attached code to reproduce the issue.
Tested on ibm_db installed by pip.
- 3.0.3
- 3.0.4
- 3.1.0
- 2.0.5.1
The cli driver is handled by pip too. The platform is Linux.
from threading import Thread
from time import sleep
import ibm_db
import ibm_db_dbi
from sqlalchemy import create_engine
# blocks main
def sa_connect():
print('connecting')
engine = create_engine('db2://a:a@1.2.3.4')
engine.connect()
print('done')
# blocks main
def ibm_db_connect():
print('connecting')
ibm_db.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
print('done')
# blocks main
def ibm_dbi_connect():
print('connecting')
ibm_db_dbi.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
print('done')
# not blocks main
def sock_connect():
import socket
sock = socket.socket()
sock.connect(('1.2.3.4', 60000))
# not blocks main
def delay_query():
print('connecting')
engine = create_engine('db2://user:pswd@127.0.0.1:50000/Chinook')
engine.connect()
print('executing delay')
engine.execute('call dbms_alert.sleep(20)')
print('start')
thread = Thread(target=ibm_db_connect)
thread.start()
for i in range(20):
print('.', end='') # will stop printing until unblocked
sleep(1)
thread.join()
print('end')
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (8 by maintainers)
This reproduces easily on my linux (ubuntu 20.04) with the test program shown above. It also reproduces easily on Win10 21H2 x64.
I am presuming that the expectation is that the main thread will show some fullstops characters on the screen (to show that the main thread is running) before the child-thread throws the timeout exception when failing to reach a nonexistent IP-address. What actually happens is that the exception text is thrown first, followed by the main thread printing the dots on screen, in other words the main thread was prevented from executing while SQLDriverConnectW() was timing out in the child thread, and the main thread resumed after the child-thread threw the exception.
Debugging shows that as soon as ibm_db.c calls (inside the child thread) the
SQLDriverConnectW(...)then the main thread becomes blocked. That is behaviour I associate with userspace threads on linux.The
ibm_db_connect_helper()function does not bracket thisSQLDriverConnectW()API withPy_BEGIN_ALLOW_THREADS;andPy_END_ALLOW_THREADS;although it does this for other cli APIs and I don’t know the reason.The issue is not about not throwing or throwing the wrong errors from API, the issue is about stopping all the threads while the connection is not timeout.
Sample code
Actual result - printing of dots paused until timeout. Main thread is running for 140s instead of 10.
Expected - nonstop printing of dots.
Hi, thanks, timeout settings helps to reduce the problem. Though even a short 2s timeout still blocks all threads. To reproduce it should be not just a random error in the connection string, but some IP of non-existing server or a server configured to drop connections. The
db2://a:a@1.2.3.4from the example should be fine for this.The current workaround is to do the connection check beforehand.
This code helps but is not 100% failproof. It still may block if sever is down just between the connection check and call to db2.connect()
I still believe in general this is unexpected and unwanted behavior, and it would be nice to have it fixed. Even by hacking the current workaround in the codebase.