python-mss: [Windows] MSS is not thread-safe

The problem

Hello, I try to do stuff like this:

  • Http request from JS
  • Python handles it with Flask
  • When Python Flask gets a request it will grab a current display view (screenshot), resize it, convert to base64 and return it as response

It’s all ok, but when I send second http request - python mss fails.

(there’s .js and .py files because it could somehow help)

server.py

# Image processing, FPS in console
import mss, cv2, base64, time
import numpy as np
from PIL import Image as i

# Get current frame of second monitor
def getFrame():
    start_time = time.perf_counter()

    # Get frame (only rgb - smaller size)
    frame_rgb     = mss.mss().grab(mss.mss().monitors[2]).rgb # type: bytes, len: 1280*720*3 (w, h, r, g, b)

    # Convert it from bytes to resize
    frame_image   = i.frombytes("RGB", (1280, 720), frame_rgb, "raw", "RGB") # PIL.Image.Image
    frame_array   = np.array(frame_image) # type: numpy.ndarray
    frame_resized = cv2.resize(frame_array, (640, 360), interpolation = cv2.INTER_CUBIC) # type: numpy.ndarray

    # Encode to base64 - prepared to send
    frame_base64  = base64.b64encode(frame_resized) # type: bytes, len: 640*360*4 (w, h, r, g, b, ???)

    print(f'{ round( 1 / (time.perf_counter() - start_time), 2) } fps')
    return frame_base64

# Flask request handler
from flask import Flask, request
from flask_cors import CORS

app = Flask(__name__)
cors = CORS(app)

@app.route('/frame_base64')
def frame_base64():
    return getFrame()

app.run(debug=True, port=7999)

script.js (little weird because of compilation from coffeescript)

(function() {

  $(document).ready(function() {

    return $.ajax({
      type: 'get',
      url: ' http://127.0.0.1:7999/frame_base64',
      success: (response) => {
        return console.log(response);
      }
    });

  });

}).call(this);

Full console log:

D:\web\projects\html-display-stream\backend>python server.py
 * Serving Flask app "server" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 258-577-249
 * Running on http://127.0.0.1:7999/ (Press CTRL+C to quit)
18.0 fps
127.0.0.1 - - [26/Feb/2020 00:36:05] "GET /frame_base64 HTTP/1.1" 200 -
127.0.0.1 - - [26/Feb/2020 00:36:06] "GET /frame_base64 HTTP/1.1" 500 -
Traceback (most recent call last):
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 2463, in __call__

    return self.wsgi_app(environ, start_response)
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 2449, in wsgi_app

    response = self.handle_exception(e)
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask_cors\extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 2446, in wsgi_app

    response = self.full_dispatch_request()
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask_cors\extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "D:\web\projects\html-display-stream\backend\server.py", line 33, in frame_base64
    return getFrame()
  File "D:\web\projects\html-display-stream\backend\server.py", line 11, in getFrame
    frame_rgb     = mss.mss().grab(mss.mss().monitors[2]).rgb # type: bytes, len: 1280*720*3 (w, h, r, g, b)
  File "C:\Users\Roman\AppData\Local\Programs\Python\Python37\lib\site-packages\mss\windows.py", line 291, in grab
    raise ScreenShotError("gdi32.GetDIBits() failed.")
mss.exception.ScreenShotError: gdi32.GetDIBits() failed.

Additional info:

Windows 7 x64 Service Pack 1 Monitors: 1920×1080, 1280×720 Python 3.7.6 pip 20.0.2 python-mss last version of now

P.S.: Sorry for my bad English.

Thanks

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (15 by maintainers)

Most upvoted comments

During the performance test, I find that this bug affects not only srcdc and bmp Overridden problems. If run in a large loop without any time.sleep(), bmp/srcdc/memdc(their windows object) will be written by multiple threads at same time, and unpredictable error occurred then raise gdi32.GetDIBits() failed. So In the following performance tests, I add a lock in the origin MSS class and acquire it inside grab method, just the same as I mentioned above.

Here we take total 1000 full size screenshots of two monitors through 1/10/100 threads. No significant performace gap investigated.

With old mss
1000 shots *   1 threads: total 50.170 seconds, 19.932 fps 
 100 shots *  10 threads: total 50.147 seconds, 19.941 fps
  10 shots * 100 threads: total 50.282 seconds, 19.888 fps
With new mss
1000 shots *   1 threads: total 50.179 seconds, 19.929 fps
 100 shots *  10 threads: total 50.015 seconds, 19.994 fps
  10 shots * 100 threads: total 50.196 seconds, 19.922 fps

@BoboTiG I have looked into your code and test something about this bug, and it seems to be a multi-thread related issue.

On windows, the handle(HDC) of device context (DC) is stored as class attribute MSS.srcdc. srcdc is assignned only once when FIRST MSS instance created. It means srcdc is shared among threads during the whole lifecircle of process. Then after program finished, let system to release DC resource automatically (I have found nothing about release these resource).

Then that is the problem. Through my test, once the THREAD who FIRST creates the MSS instance is destroyed, DC will be recyled. As results the HDC(srcdc) is no longer a valid handle(value not changed though). But next/every time, we only check non-null of HDC rather then the validity in win32 system. Finally we cannot get valid bits data through the outdated HDC and raise error.

BTW, since I am not familiar with win32 programing, I’m not sure DC whether automatically is recyled when the corresponding thread dead. What I confirmed is that HDC(srcdc) will become a invalid handle, while memdc is safe. If not released in actual, there may be memory leaks problem in the following solutions.

Test case

import multiprocessing
import threading
import time
from mss import mss

lock = threading.Lock()

def prefix():
    return f'[{time.time() - t0:.3f}s] [{multiprocessing.current_process().name}-{threading.current_thread().name}] - '

def screenshot():
    with lock:
        with mss() as sct:
            mon = sct.monitors[1]
            shot = sct.grab(mon).rgb
            return shot

def run_child(loops=5):
    n = 0
    while n < loops:
        n += 1
        time.sleep(1)
        try:
            res = f'bytes length {len(screenshot())}'
        except Exception as e:
            res = e
        print(prefix() + f'No.{n}: res = {res}')
    print(prefix() + f'exit.')

def create_job(name=None, loops=5, process=False):
    if process:
        return multiprocessing.Process(target=run_child, name=name, args=(loops,))
    else:
        return threading.Thread(target=run_child, name=name, args=(loops,))

# %%
if __name__ == '__main__':
    # screenshot()  # if first call in main thread, every thing is ok.
    _process = False  # run in multi-process(True) or multi-thread(False) mode.
    t0 = time.time()
    job1 = create_job('ThreadA', 4, _process)
    job2 = create_job('ThreadB', 4, _process)
    job3 = create_job('ThreadC', 4, _process)
    job1.start()
    job2.start()
    time.sleep(2)
    job3.start()

Output

[1.099s] [MainProcess-ThreadA] - No.1: res = bytes length 6220800
[1.177s] [MainProcess-ThreadB] - No.1: res = bytes length 6220800
[2.197s] [MainProcess-ThreadA] - No.2: res = bytes length 6220800
[2.256s] [MainProcess-ThreadB] - No.2: res = bytes length 6220800
[3.084s] [MainProcess-ThreadC] - No.1: res = bytes length 6220800
[3.261s] [MainProcess-ThreadA] - No.3: res = bytes length 6220800
[3.326s] [MainProcess-ThreadB] - No.3: res = bytes length 6220800
[4.161s] [MainProcess-ThreadC] - No.2: res = bytes length 6220800
[4.331s] [MainProcess-ThreadA] - No.4: res = bytes length 6220800
[4.331s] [MainProcess-ThreadA] - exit.
[4.337s] [MainProcess-ThreadB] - No.4: res = gdi32.GetDIBits() failed.
[4.337s] [MainProcess-ThreadB] - exit.
[5.169s] [MainProcess-ThreadC] - No.3: res = gdi32.GetDIBits() failed.
[6.177s] [MainProcess-ThreadC] - No.4: res = gdi32.GetDIBits() failed.
[6.177s] [MainProcess-ThreadC] - exit.

Let’s combine the examples talked in this issue:

  1. first instantiation called in MainThread, lifecircle of MainThread is the same as its process => everything ok (This also means multiprocessing is safe)
    • call mss() in MainThread or set/import a global MSS=mss() in MainThread.
  2. first instantiation called in child thread A , then another child thread B or MainThread call grab(): if A already dead=>failed, if A still alive, success.
    • that’s the trouble @RamoFX run into, since Flask is running in multi-threaded(default) and all incoming requests are handled in child thread. If mss not instantiated once before flask_app.run(), it will raise error from the 2nd request.

Solutions

As I said, mss might be unsafe in multi-threaded mode. There are mainly two thing to do:

  1. need a threading.Lock to avoid resource(bmp/memdc/srcdc) modified by multiple threads at same time. (it is neccessary but not related to this issue). lock.acquire() in MSSBase.__enter__() and lock.release() in MSSBase.__exit__()
  2. ensure the validity of srcdc. As for the second one, there is several methods:

S1 - release resource every time

In this situation, created and release every time, so bmp/srcdc/memdc could be set as instance attr rather class atrr. It might be the most safe way, but cost more time to grab screen, especially frequently called. (It maight against your one intension to set them as class attr to improve speed?)

class MSS(MSSBase):
    def close(self):  # type: () -> None
        # here I use pypiwin32 package directly
        win32gui.ReleaseDC(0, MSS.srcdc)
        win32gui.DeleteDC(MSS.memdc)
        MSS.srcdc = MSS.memdc = None
        super().close()

S2 - check validity of srcdc every time

I don’t know whether there is a win32 api to check the validity of HDC. If there is, it should be a good solution.

    # old code in MSS.__init__()
    if not MSS.srcdc or not MSS.memdc:
        MSS.srcdc = self.user32.GetWindowDC(0)
        MSS.memdc = self.gdi32.CreateCompatibleDC(MSS.srcdc)
    
    # new code for solution-2, if there is such a function acts like is_valid_hdc
    if not MSS.srcdc and not is_valid_hdc(MSS.srcdc):
        MSS.srcdc = self.user32.GetWindowDC(0)
    if MSS.memdc is None:
        MSS.memdc = self.gdi32.CreateCompatibleDC(MSS.srcdc)

S3 - maintain a dict of thread-srcdc pair or a variable of alive thread

Emmm, the solution seems not so beatiful, but exactly solved the problem and decrease frequency to create and release resource. A thread can only ensure cur thread and main thread are alive. So when instance created, search cur thread and main thread in maintained dict (e.g.MSS._srcdc_dict), if not found, create a new DC.

    cur_thread, main_thread = threading.current_thread(), threading.main_thread()
    MSS.srcdc = MSS._srcdc_map.get(cur_thread) or MSS._srcdc_map.get(main_thread) or self.user32.GetWindowDC(0)
    MSS._srcdc_map[cur_thread] = srcdc
    if MSS.memdc is None:
        MSS.memdc = self.gdi32.CreateCompatibleDC(MSS.srcdc)

What’s more

This bug only affect srcdc, in solutions above, we may need to create more than once. But memdc is always available, we can create once. And the offferred code above, I use pypiwin32 package directly, if just use ctype.WINDLL, _set_cfunctions should be supplemented. (And if screen config changed when program running, is it necessary to update srcdc and memdc? and how to detect changes? may not so related to this issue.)

If you mean the tests of regression_issue_128 and regression_issue_135, these tests all success both in main thread and child thread.

With pleasure, I’ll create a PR later. And one more thing I’m concerning is about the safety of MSS.bmp. Inside grab of ThreadA, a.gdi32.BitBlt transfers data from srcdc to memdc and a.gdi32.GetDIBits transfer from memdc to srcdc. Once another ThreadB runs b.gdi32.BitBlt before a.gdi32.GetDIBits, the final screenshot of ThreadA will be updated to be the same as ThreadB because they use the same MSS.bmp.

Test threading lock

I add a time lapse(3 seconds) between gdi32.BitBlt and gdi32.GetDIBits, and check whether to acquire lock when grab called. Last time, I suggest to acquire lock when mss() created, but only in grab may be a better choice. The modified code of windows.py and test.py are hosted on gist.

Console

[2020-04-27 14:44:14,440] [MainThread] ===== set with_lock = False =====
[2020-04-27 14:44:14,442] [MainThread] start shot
[2020-04-27 14:44:18,160] [MainThread] end shot
[2020-04-27 14:44:18,167] [ThreadA] start shot
[2020-04-27 14:44:19,169] [ThreadB] start shot
[2020-04-27 14:44:21,593] [ThreadA] end shot
[2020-04-27 14:44:22,529] [ThreadB] end shot
[2020-04-27 14:44:22,531] [MainThread] ===== set with_lock = True =====
[2020-04-27 14:44:22,533] [MainThread] start shot
[2020-04-27 14:44:22,533] [MainThread] in grab has acquired lock
[2020-04-27 14:44:25,681] [MainThread] in grab has released lock
[2020-04-27 14:44:26,285] [MainThread] end shot
[2020-04-27 14:44:26,292] [ThreadA] start shot
[2020-04-27 14:44:26,293] [ThreadA] in grab has acquired lock
[2020-04-27 14:44:27,294] [ThreadB] start shot
[2020-04-27 14:44:29,381] [ThreadA] in grab has released lock
[2020-04-27 14:44:29,381] [ThreadB] in grab has acquired lock
[2020-04-27 14:44:29,970] [ThreadA] end shot
[2020-04-27 14:44:32,490] [ThreadB] in grab has released lock
[2020-04-27 14:44:32,761] [ThreadB] end shot

Screenshots

Without lock: wrong screenshots. Two monitors in one screenshot: monitor-0-no-lock Two monitors in separated screenshot: monitor-1 of ThreadA is overridden by ThreadB. monitor-1-no-lock+monitor-2-no-lock With lock: works as expected Two monitors in one screenshot: monitor-0-with-lock Two monitors in separated screenshot: monitor-1-with-lock+monitor-2-with-lock

I will open a PR both with srcdc_dict and lock fixed.

FTR the code works fine on macOS. Will try later on Windows.

And if you move the MSS object at the root of the script, something like MSS = mss.mss() and use it to grab: frame_rgb = MSS.grab(MSS.monitors[2]).