pysat: Ctrl+C causes Python interpreter with pysat loaded to crash not gracefully yield an exception
Any time Ctrl+C is used to interrupt running code, it will throw a KeyboardInterrupt. But something is going on after doing SAT solving which causes it to not work all of the time and instead just exit the interpreter. Either this indicates Ctrl+C is being explicitly caught and inappropriately dealt with or some type of cleanup code is probably cause a crash.
Obviously important is that this is being run on Windows as Linux can have different behavior in this area as it uses a signal handling system which is a bit different. My event viewer clearly shows something is causing crashes and it never happens if I do not load pysat:
Faulting application name: python.exe, version: 3.8.2150.1013, time stamp: 0x5e55a7f0
Faulting module name: ntdll.dll, version: 10.0.18362.815, time stamp: 0xb29ecf52
Exception code: 0xc0000028
Fault offset: 0x00000000000fbf18
Faulting process id: 0x12448
Faulting application start time: 0x01d65234e08c1176
Faulting application path: C:\Program Files\Python38\python.exe
Faulting module path: C:\WINDOWS\SYSTEM32\ntdll.dll
Report Id: 75efd8f9-c92c-410e-9acc-a00aedd6f2b5
Faulting package full name:
Faulting package-relative application ID:
Since 0xc0000028 is a stack corruption issue, and everything else I am using is pure python, it implies that the compiled C code is responsible for this… Python 3.8 almost never crashes even under absurd memory load, etc.
I can attach a debugger and get a stack trace shortly as it will help narrow down which C code is the culprit.
File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 287, in enumerate
mcs = self.compute()
File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 271, in compute
self._compute()
File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 390, in _compute
self.do_cld_check(self.setd[i:])
File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 441, in do_cld_check
self._filter_satisfied(update_setd=True)
File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 355, in _filter_satisfied
if not self.satc[i]:
File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 355, in _filter_satisfied
if not self.satc[i]:
KeyboardInterrupt
The program '[76260] python.exe' has exited with code -1073741784 (0xc0000028).
This is not particularly helpful beyond showing it definitely is some type of memory mismanagement in the object deletion chain for LBX. Obviously something is not cleaning itself up properly and causing crashes.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 27 (13 by maintainers)
Disabling the signal handler causes a pass through but it’s not a good solution at all. Obviously something with the use of PyOS_setsig is the issue.
One known issue is use of import_array() for numpy. Apparently it sets up signal handlers and can throw off the expected behavior.
With a simple debug build and debugging session this is an easy solve. Just pinpoint the root cause of the crash and it won’t be hard to back track abd figure out the code issue. So this discussion seems a bit silly. Disabling the signal handler is not a fix. This should be a portable Windows supported API. Some small detail is misimplemented. It’s likely buggy on Linux too, it’s just there for whatever reason it doesn’t crash. This is with high likelihood a universal bug from a clean coding practice standpoint. Microsoft tends to be more standards compliant these days than anyone else. So stability on Windows or MSVC means stability on Linux or GCC. But the reverse is never true. In fact, despite all the hate towards Microsoft and praise towards Linux over the years, the best programmers are invariably always the ones who can write portable code. So again to conclude, this project has poor signal handling code which requires manual debugging to solve.
Also this looks like a bug in glucose from the Python stack trace.
I found pretty simple but not trivial hard CNF borrowed from https://github.com/arminbiere/satch repository and this error now is reproducable:
On NT family (Windows) it means:
in WSL2 (Ubuntu 22.04 LTS) similar situation took:
I see, 3.12 isn’t supported yet.
Sure, but the traceback log you attached clearly shows the execution was interrupted in Python. It would help to catch it when it is inside one of the solver’s code.
I have access to Windows, will try to reproduce it.
Yep, that’s a problem. It used to work properly a few months ago but now this is what happens. I need to investigate what causes it.