pysat: Ctrl+C causes Python interpreter with pysat loaded to crash not gracefully yield an exception

Any time Ctrl+C is used to interrupt running code, it will throw a KeyboardInterrupt. But something is going on after doing SAT solving which causes it to not work all of the time and instead just exit the interpreter. Either this indicates Ctrl+C is being explicitly caught and inappropriately dealt with or some type of cleanup code is probably cause a crash.

Obviously important is that this is being run on Windows as Linux can have different behavior in this area as it uses a signal handling system which is a bit different. My event viewer clearly shows something is causing crashes and it never happens if I do not load pysat:

Faulting application name: python.exe, version: 3.8.2150.1013, time stamp: 0x5e55a7f0
Faulting module name: ntdll.dll, version: 10.0.18362.815, time stamp: 0xb29ecf52
Exception code: 0xc0000028
Fault offset: 0x00000000000fbf18
Faulting process id: 0x12448
Faulting application start time: 0x01d65234e08c1176
Faulting application path: C:\Program Files\Python38\python.exe
Faulting module path: C:\WINDOWS\SYSTEM32\ntdll.dll
Report Id: 75efd8f9-c92c-410e-9acc-a00aedd6f2b5
Faulting package full name: 
Faulting package-relative application ID: 

Since 0xc0000028 is a stack corruption issue, and everything else I am using is pure python, it implies that the compiled C code is responsible for this… Python 3.8 almost never crashes even under absurd memory load, etc.

I can attach a debugger and get a stack trace shortly as it will help narrow down which C code is the culprit.

  File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 287, in enumerate
    mcs = self.compute()
  File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 271, in compute
    self._compute()
  File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 390, in _compute
    self.do_cld_check(self.setd[i:])
  File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 441, in do_cld_check
    self._filter_satisfied(update_setd=True)
  File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 355, in _filter_satisfied
    if not self.satc[i]:
  File "C:\Program Files\Python38\lib\site-packages\pysat\examples\lbx.py", line 355, in _filter_satisfied
    if not self.satc[i]:
KeyboardInterrupt
The program '[76260] python.exe' has exited with code -1073741784 (0xc0000028).

This is not particularly helpful beyond showing it definitely is some type of memory mismanagement in the object deletion chain for LBX. Obviously something is not cleaning itself up properly and causing crashes.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 27 (13 by maintainers)

Most upvoted comments

Also this looks like a bug in glucose from the Python stack trace.

Interesting, will try with more fresh Glucose (4+) and also if you right, KeyboardInterrupt should work with other solvers.

Error is reproducable with all solvers including minisat (by default).

I found a place in solvers/pysolvers.cc and after local change in method py_glucose3_solve start to receive KeyboardInterrupt also on Windows:

image

(.venv) PS C:\Workspace\References\pysathq-pysat> python.exe .\examples\lbx.py -d -e all -s glucose3 -vv .\examples\cnfs\prime65537.cnf
Traceback (most recent call last):
  File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 584, in <module>
    for i, mcs in enumerate(mcsls.enumerate()):
  File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 301, in enumerate
    mcs = self.compute()
          ^^^^^^^^^^^^^^
  File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 285, in compute
    self._compute()
  File "C:\Workspace\References\pysathq-pysat\examples\lbx.py", line 408, in _compute
    if self.oracle.solve(assumptions=self.ss_assumps + self.bb_assumps + [self.setd[i]]):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Workspace\References\pysathq-pysat\.venv\Lib\site-packages\python_sat-0.1.8.dev12-py3.12-win-amd64.egg\pysat\solvers.py", line 564, in solve 
    return self.solver.solve(assumptions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Workspace\References\pysathq-pysat\.venv\Lib\site-packages\python_sat-0.1.8.dev12-py3.12-win-amd64.egg\pysat\solvers.py", line 2827, in solve
    self.status = pysolvers.glucose3_solve(self.glucose, assumptions,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
(.venv) PS C:\Workspace\References\pysathq-pysat> $LASTEXITCODE
-1073741510

where exit Code 0xC000013A exactly means that the application terminated as a result of a CTRL+C.

Without this change glucose4 still exit silently:

(.venv) PS C:\Workspace\References\pysathq-pysat> python.exe .\examples\lbx.py -d -e all -s glucose4 -vv .\examples\cnfs\prime65537.cnf
(.venv) PS C:\Workspace\References\pysathq-pysat> $LASTEXITCODE
-1073741784

P.S. File pysolvers.cc contains a large amount of duplicate code for each solver and is an excellent candidate for a code generator, I will not change it yet, so as not to break the logic on other platforms.

Disabling the signal handler causes a pass through but it’s not a good solution at all. Obviously something with the use of PyOS_setsig is the issue.

One known issue is use of import_array() for numpy. Apparently it sets up signal handlers and can throw off the expected behavior.

With a simple debug build and debugging session this is an easy solve. Just pinpoint the root cause of the crash and it won’t be hard to back track abd figure out the code issue. So this discussion seems a bit silly. Disabling the signal handler is not a fix. This should be a portable Windows supported API. Some small detail is misimplemented. It’s likely buggy on Linux too, it’s just there for whatever reason it doesn’t crash. This is with high likelihood a universal bug from a clean coding practice standpoint. Microsoft tends to be more standards compliant these days than anyone else. So stability on Windows or MSVC means stability on Linux or GCC. But the reverse is never true. In fact, despite all the hate towards Microsoft and praise towards Linux over the years, the best programmers are invariably always the ones who can write portable code. So again to conclude, this project has poor signal handling code which requires manual debugging to solve.

Also this looks like a bug in glucose from the Python stack trace.

By any chance, can you try it on hard CNF instances where a SaT call takes a lot of time so that you interrupt it during a SAT call?

It was exactly hard CNF formula (89Mb) and I tried several times, but I will try to more deeply analyze the moment of interrupt, ok.

Sure, but the traceback log you attached clearly shows the execution was interrupted in Python. It would help to catch it when it is inside one of the solver’s code.

I found pretty simple but not trivial hard CNF borrowed from https://github.com/arminbiere/satch repository and this error now is reproducable:

(.venv) PS C:\Workspace\References\pysathq-pysat\examples> python.exe lbx.py -d -e all -s glucose3 -vv .\cnfs\prime65537.cnf
(.venv) PS C:\Workspace\References\pysathq-pysat\examples> $LASTEXITCODE
-1073741784

On NT family (Windows) it means:

from ntstatus.h:

//
// MessageId: STATUS_BAD_STACK
//
// MessageText:
//
// An invalid or unaligned stack was encountered during an unwind operation.
//
#define STATUS_BAD_STACK                 ((NTSTATUS)0xC0000028L)

in WSL2 (Ubuntu 22.04 LTS) similar situation took:

du@ABC:/mnt/c/Workspace/References/pysathq-pysat/examples$ python3 lbx.py -d -e all -s glucose3 -vv cnfs/prime65537.cnf
^CTraceback (most recent call last):
  File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 584, in <module>
    for i, mcs in enumerate(mcsls.enumerate()):
  File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 301, in enumerate
    mcs = self.compute()
  File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 285, in compute
    self._compute()
  File "/mnt/c/Workspace/References/pysathq-pysat/examples/lbx.py", line 408, in _compute
    if self.oracle.solve(assumptions=self.ss_assumps + self.bb_assumps + [self.setd[i]]):
  File "/home/du/.local/lib/python3.10/site-packages/pysat/solvers.py", line 564, in solve
    return self.solver.solve(assumptions)
  File "/home/du/.local/lib/python3.10/site-packages/pysat/solvers.py", line 2827, in solve
    self.status = pysolvers.glucose3_solve(self.glucose, assumptions,
pysolvers.error: Caught keyboard interrupt
du@ABC:/mnt/c/Workspace/References/pysathq-pysat/examples$ echo $?
1

I see, 3.12 isn’t supported yet.

image

image

By any chance, can you try it on hard CNF instances where a SaT call takes a lot of time so that you interrupt it during a SAT call?

It was exactly hard CNF formula (89Mb) and I tried several times, but I will try to more deeply analyze the moment of interrupt, ok.

Sure, but the traceback log you attached clearly shows the execution was interrupted in Python. It would help to catch it when it is inside one of the solver’s code.

I have access to Windows, will try to reproduce it.

Yep, that’s a problem. It used to work properly a few months ago but now this is what happens. I need to investigate what causes it.