panel: Memory leak in panel
I think I am still experiencing this issue: https://discourse.holoviz.org/t/panel-holoviews-bokeh-app-memory-leaks-looking-for-general-best-practices/2379
Module <module 'bokeh_app_2cc5b245240b4a44b4c714144b1685d1' from 'my_apps.py'> has extra unexpected referrers! This could indicate a serious memory leak. Extra referrers: [<cell at 0x7fd66ebf6310: module object at 0x7fd67015c2f0>]
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 38 (21 by maintainers)
As I mentioned in #2302 I can recreate this problem. I don’t know if this is the cause of the problem but I can also get a memory leak message with the following code:
Which gives the following output, note that it has more information than “normally”:
https://user-images.githubusercontent.com/19758978/129912100-d6304b2a-23f0-4c74-b05e-4d88553493db.mp4
Environment information:
conda info
conda list
Seemingly have tracked down the culprit, if you add
import numba.cudato your app you will get the extra referrer warning. Will ask the numba folks if they have an idea.The culprit is here: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/driver.py#L224
The error on import gets held on the
numba.cuda.cudadrv.driver.driversingleton object and therefore keeps a reference to the module, which doesn’t get cleaned up. Numba team is aware and will hopefully fix this asap.I have had this “extra unexpected referrers” warning for months with my panel app that uses datashader. Over the various combinations of versions of bokeh/panel/datashader I have used, I cannot remember a time when this issue did not occur at some point. However I did not bother to report it as I thought it might be a false positive or, if not, at least it did not create noticeable memory problems from a user perspective in my case.
If there’s indeed a real memory leak involved, I’m looking forward to finding a way to prevent that. It’s not clear to me if the root cause may in part lie in the application code itself or not though.
@andhuang-CLGX I am currently not convinced that these plots are entirely accurate in a useful sense. See. e.g
https://distributed.dask.org/en/latest/worker.html#memory-not-released-back-to-the-os
In the work I have done so far I can confirm that after a session cleanup there are not any Bokeh Sessions, Documents or document modules, Bokeh models (modulo one cycle I am still finishing cleaning up), or excess DataFrames anwhere in the Python runtime. This is absolutely certain from direct inspection of
gc.get_objects()— those types of objects were present ingc.objects()before cleanup, and are 100% gone afterwards.Yet, reported RSS does not really shrink after cleanup. Except until it does. If you look in the “part 5” PR you can see that I contrived an example to add 1GB of memory every session. If there is actually a leak then I would expect to eventually OOM in short order. But what happens, opening one session after other is that memory is eventually reclaimed according to RSS reported, but only after ~2GB is exceeded total. This pattern of growing and shrinking repeats indefinitely, and there is never any OOM. It seems undeniable to me that this reported number is being modulated by something at a lower level than we cannot control.
What I would actually want to see demonstrated at this point is a real, actual OOM. That would definitively prove that memory is being leaked. But I would be surprised if that is possible with pure Bokeh. It might possible that there are leaks in Panel, but that will require it’s own investigation.
thanks for your hard work! I really appreciate your quick updates; I will try it out tomorrow and let you know.
@andhuang-CLGX There is a lot of low level things to clean up, that PR was just Part 1 of several (as the title implied). Since you seem capable to test out things from source, here is the latest installment, that is about 95% done, at least from the perspective of anything reproduced by the OP in bokeh/bokeh#11477
Latest PR: https://github.com/bokeh/bokeh/pull/11523
It would certainly be helpful to know if it helps the situation here as well, though as @philippjfr notes, the problem here may lie on the Panel side (e.g. holding references too aggressively)
To offer more context, I am using an app with datashader + a lot of callbacks + a database
But I’ll try to find a MCVE on my personal time
Maybe this https://github.com/bokeh/bokeh/issues/11477 is relevant.