cocotb: Simple Python exception causes segmentation fault on vcs

We have been struggling with random segmentation faults with our verification suite and VCS and until now we had assumed it was a problem with our code. However, recently I have noticed similar behaviour with a very simple test case which appears to be a problem with cocotb itself.

Let’s say I have the following simple test:

import cocotb

@cocotb.test()
def check_top_level(dut):
    """Checks the simulation was built and can be clocked."""
    yield do_something(dut)

Where in another file I have this:

import cocotb
# Note I have commented out this import on purpose...
# from cocotb.triggers import Timer

@cocotb.coroutine
def simple_coroutine(dut):
    yield Timer(500)

@cocotb.coroutine
def do_something(dut):
    # dut.i_bank_rst__b_a = 0    # Uncomment me to cause a segfault
    yield simple_coroutine(dut)

Now, if you run this as-is you will get an exception due to Timer not being resolved. However, if you access any of the verilog itself (e.g uncomment the line that resets a reset signal in our design), then what would normally be thrown as an exception now causes a segfault. This seems to be an issue with the embedded Python interpreter shutting down before the exception is printed and then Python tries to access invalid memory and causes a segfault (at least that is our guess).

We have seen similar issues (segfault or a lockup) happen when a Python exception happens after we have accessed the verilog, but before the first yield of a coroutine. It seems to be related to the embedded Python in cocotb doing something wrong when shutting down.

I think if we could get to the root of this issue in cocotb, we could then begin to get to the real root of our segfaults - which could be a simple Python exception coming from somewhere.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 1
  • Comments: 45 (19 by maintainers)

Most upvoted comments

I was hitting the same issue with VCS and can confirm that either of the following fixes the issue if put at the end of the sequence:

await RisingEdge(tb.clk)
await Timer(1, units='ns')

I think on most simulators this is when the sim ends as stop_simulator never returns on Questa or Icarus. But Py_Finalize isn’t called until the shutdown callback is run. Can you put in some debugging prints to ensure it is being run?

Icarus and Aldec Active-HDL return from gpi_sim_end, which is why I added the sim_ending flag in simulatormodule.c and set it in stop_simulator. Once the cocotb scheduler returns back into handle_gpi_callback, gpi_sim_end is called which eventually calls Py_Finalize.

@markmelvin The line in question should not need to be conditioned. It’s worth investigating why it is in your case, and fix that issue instead.

Indeed, running my original test case from this issue, I can now see a proper shutdown and error message with registered_impls[0]->sim_end(); commented out as per above. I will try to step through the code and see what exactly is going wrong here and post more info as I get it.

@markmelvin: if you’re desperate to work out what your exception is, add some logging to cocotb.outcomes.Error.__init__ in your local installation, which will definitely be reached if any error occurs

Setting DEBUG=1 gives a bit more info for this particular issue:

--- Stack trace follows:

Dumping VCS Annotated Stack:
#0  0x00002aaab2cdebbc in waitpid () from /lib64/libc.so.6
#1  0x00002aaab2c5cea2 in do_system () from /lib64/libc.so.6
#2  0x00002aaaabacba4c in SNPSle_10ee25eff68cd8461c9146fa1d0b35e87067f3c8015b313e639d2928478c79b3f673f99203bcf8be64600612100082236bffb2007f1e0ef9 () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/liberrorinf.so
#3  0x00002aaaabacd58e in SNPSle_10ee25eff68cd8461c9146fa1d0b35e87067f3c8015b313efba706aab251478fa49e66610e453774633a6c152e7ef778f2202cda681f3d4e () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/liberrorinf.so
#4  0x00002aaaabac5fc3 in SNPSle_d35ca1ff70d465c2b9b1a72eee90a506fdd009d3de3db1de () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/liberrorinf.so
#5  0x00002aaaae06416f in SNPSle_64133461705005bb725549e2e6fa1b3f () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/libvcsnew.so
#6  0x00002aaaadeabdd9 in SNPSle_82244d58c54c18c70d63edc9becab634 () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/libvcsnew.so
#8  0x00002aaabf3c91a2 in update_refs (containers=0x2aaabf74b5d0 <_PyRuntime+400>) at Modules/gcmodule.c:243
#10 0x00002aaabf3ca8e5 in collect_with_callback (generation=2) at Modules/gcmodule.c:1028
#12 0x00002aaabf3ca94d in _PyGC_CollectIfEnabled () at Modules/gcmodule.c:1587
#13 0x00002aaabf39abeb in Py_FinalizeEx () at Python/pylifecycle.c:1185
#14 0x00002aaabf9538cb in embed_sim_cleanup () at gpi_embed.c:199
#15 0x00002aaabefe1108 in gpi_cleanup () at GpiCommon.cpp:145
#16 0x00002aaabefe10c8 in gpi_embed_end () at GpiCommon.cpp:134
#17 0x00002aaabebc46e5 in VpiShutdownCbHdl::run_callback (this=0xdb5b50) at VpiCbHdl.cpp:474
#18 0x00002aaabebc04de in handle_vpi_callback (cb_data=0xd629e0) at VpiImpl.cpp:556
#19 0x00002aaaad4c08c0 in SNPSle_27959203ce5ad0720e45d4754d3a515910cb8b7ca4bb47195752011ce913284754e716cad71ddff5 () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/libvcsnew.so
#20 0x00002aaaad4c34f6 in SNPSle_eed38e464c7100f70d40bb073590aafa () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/libvcsnew.so
#21 0x00002aaaaf514215 in SNPSle_294d108d9c21223f31bca47d22167e39 () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/libvcsnew.so
#22 0x00002aaaae0779fd in SNPSle_85eeb05ebf66d014 () from /server/cad/synopsys/vcs-mx/O-2018.09-1/linux64/lib/libvcsnew.so
#23 0x0000000000000000 in ?? ()
Completed context dump phase data

During a VPI callback function for callback reason="cbEndOfSimulation" 
During Error message function Code:="VFS_SDB_ERROR