python-ibmdb: ibm_db connect() blocks main thread when server not responding until timeout

Hi, while working on SlashDB, we faced an issue that the .connect() function in the thread blocks all the threads while it trying to connect. When a server is down, or IP is wrong the connection will block all threads until it times out. So the whole web server hangs. To reproduce firewall must drop the packages, so it will timeout, not just quick reject.

Seems that only .connect() is affected. The .execute(), even long, does not block. Please see attached code to reproduce the issue.

Tested on ibm_db installed by pip.

  • 3.0.3
  • 3.0.4
  • 3.1.0
  • 2.0.5.1

The cli driver is handled by pip too. The platform is Linux.

from threading import Thread
from time import sleep
import ibm_db
import ibm_db_dbi
from sqlalchemy import create_engine

# blocks main
def sa_connect():
    print('connecting')
    engine = create_engine('db2://a:a@1.2.3.4')
    engine.connect()
    print('done')

# blocks main
def ibm_db_connect():
    print('connecting')
    ibm_db.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
    print('done')

# blocks main
def ibm_dbi_connect():
    print('connecting')
    ibm_db_dbi.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
    print('done')

# not blocks main
def sock_connect():
    import socket
    sock = socket.socket()
    sock.connect(('1.2.3.4', 60000))

# not blocks main
def delay_query():
    print('connecting')
    engine = create_engine('db2://user:pswd@127.0.0.1:50000/Chinook')
    engine.connect()
    print('executing delay')
    engine.execute('call dbms_alert.sleep(20)')

print('start')
thread = Thread(target=ibm_db_connect)
thread.start()
for i in range(20):
    print('.', end='')  # will stop printing until unblocked
    sleep(1)
    
thread.join()
print('end')

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (8 by maintainers)

Most upvoted comments

This reproduces easily on my linux (ubuntu 20.04) with the test program shown above. It also reproduces easily on Win10 21H2 x64.

I am presuming that the expectation is that the main thread will show some fullstops characters on the screen (to show that the main thread is running) before the child-thread throws the timeout exception when failing to reach a nonexistent IP-address. What actually happens is that the exception text is thrown first, followed by the main thread printing the dots on screen, in other words the main thread was prevented from executing while SQLDriverConnectW() was timing out in the child thread, and the main thread resumed after the child-thread threw the exception.

Debugging shows that as soon as ibm_db.c calls (inside the child thread) the SQLDriverConnectW(...) then the main thread becomes blocked. That is behaviour I associate with userspace threads on linux.

The ibm_db_connect_helper() function does not bracket this SQLDriverConnectW() API with Py_BEGIN_ALLOW_THREADS; and Py_END_ALLOW_THREADS; although it does this for other cli APIs and I don’t know the reason.

The issue is not about not throwing or throwing the wrong errors from API, the issue is about stopping all the threads while the connection is not timeout.

Sample code

from threading import Thread
from time import sleep, time
import ibm_db


# blocks main
def ibm_db_connect():
    print('connecting')
    ibm_db.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
    print('thread done')


print('start')
thread = Thread(target=ibm_db_connect)
thread.start()
start_time = time()
for i in range(20):
    print('.', end='')  # will stop printing until unblocked
    sleep(0.5)
print('\nmain done')
print('main duration', time() - start_time, 'expected - about 10')

thread.join()
print('all end')

Actual result - printing of dots paused until timeout. Main thread is running for 140s instead of 10.

start
connecting
..Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/kotofos/.pyenv/versions/3.9.8/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/kotofos/.pyenv/versions/3.9.8/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/kotofos/.config/JetBrains/PyCharm2021.2/scratches/scratch.py", line 9, in ibm_db_connect
    ibm_db.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
Exception: [IBM][CLI Driver] SQL30081N  A communication error has been detected. Communication protocol being used: "TCP/IP".  Communication API being used: "SOCKETS".  Location where the error was detected: "1.2.3.4".  Communication function detecting the error: "connect".  Protocol specific error code(s): "110", "*", "*".  SQLSTATE=08001 SQLCODE=-30081
..................
main done
main duration 140.81787991523743 expected - about 10
all end

Expected - nonstop printing of dots.

start
connecting
....................
main done
main duration 10.011082887649536 expected - 10

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/kotofos/.pyenv/versions/3.9.8/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/kotofos/.pyenv/versions/3.9.8/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/kotofos/.config/JetBrains/PyCharm2021.2/scratches/scratch.py", line 9, in ibm_db_connect
    ibm_db.connect("DATABASE=name;HOSTNAME=1.2.3.4;PORT=60000;PROTOCOL=TCPIP;UID=username;PWD=password;", "", "")
Exception: [IBM][CLI Driver] SQL30081N  A communication error has been detected. Communication protocol being used: "TCP/IP".  Communication API being used: "SOCKETS".  Location where the error was detected: "1.2.3.4".  Communication function detecting the error: "connect".  Protocol specific error code(s): "110", "*", "*".  SQLSTATE=08001 SQLCODE=-30081
all end

Hi, thanks, timeout settings helps to reduce the problem. Though even a short 2s timeout still blocks all threads. To reproduce it should be not just a random error in the connection string, but some IP of non-existing server or a server configured to drop connections. The db2://a:a@1.2.3.4 from the example should be fine for this.

The current workaround is to do the connection check beforehand.

    sock = socket.socket()
    sock.settimeout(2)
    result = sock.connect_ex((host, port))
    if result != 0:
        raise

    engine = create_engine(connstr)
    engine.connect()

This code helps but is not 100% failproof. It still may block if sever is down just between the connection check and call to db2.connect()

I still believe in general this is unexpected and unwanted behavior, and it would be nice to have it fixed. Even by hacking the current workaround in the codebase.