legion: All test segfault on Fedora Rawhide with legion-19.04.0

      Start 25: rendering
25/25 Test #25: rendering ........................***Exception: SegFault  0.44 sec
[buildhw-10:7077 :0:7077] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7ff0db8da948)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x194a3) [0x7ff0db8624a3]
    1  /lib64/libucs.so.0(+0x1965a) [0x7ff0db86265a]
    2  /lib64/libuct.so.0(+0x1b72b) [0x7ff0dbbb072b]
    3  /lib64/ld-linux-x86-64.so.2(+0xfe4a) [0x7ff0e379ee4a]
    4  /lib64/ld-linux-x86-64.so.2(+0xff51) [0x7ff0e379ef51]
    5  /lib64/ld-linux-x86-64.so.2(+0x13eae) [0x7ff0e37a2eae]
    6  /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff0e20415b9]
    7  /lib64/ld-linux-x86-64.so.2(+0x1372e) [0x7ff0e37a272e]
    8  /lib64/libdl.so.2(+0x239c) [0x7ff0e1f0339c]
    9  /lib64/libc.so.6(_dl_catch_exception+0x79) [0x7ff0e20415b9]
   10  /lib64/libc.so.6(_dl_catch_error+0x33) [0x7ff0e2041653]
   11  /lib64/libdl.so.2(+0x2af9) [0x7ff0e1f03af9]
   12  /lib64/libdl.so.2(dlopen+0x4a) [0x7ff0e1f0342a]
   13  /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6ead7) [0x7ff0e1b01ad7]
   14  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x1f4) [0x7ff0e1adf524]
   15  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35b) [0x7ff0e1ade4eb]
   16  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7ff0e1ae9dfe]
   17  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x256) [0x7ff0e1aea2e6]
   18  /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x14) [0x7ff0e1aea344]
   19  /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x695) [0x7ff0e1db6795]
   20  /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Init_thread+0x5b) [0x7ff0e1de6bbb]
   21  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/lib/librealm.so.1(AMMPI_SPMDSetThreadMode+0x182) [0x7ff0e2c879e2]
   22  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/lib/librealm.so.1(gex_Client_Init_GASNET_201930PARpshmFASTnodebugnotracenostatsnodebugmallocnosrclines+0x15d) [0x7ff0e2c15e5d]
   23  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/lib/librealm.so.1(_ZN5Realm11RuntimeImpl12network_initEPiPPPc+0x120) [0x7ff0e2bc6f30]
   24  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/lib/liblegion.so.1(_ZN6Legion8Internal7Runtime10initializeEPiPPPc+0x24) [0x7ff0e35ba694]
   25  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/lib/liblegion.so.1(_ZN6Legion8Internal7Runtime5startEiPPcb+0x250) [0x7ff0e35f49f0]
   26  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/bin/rendering(main+0x1b7) [0x556a7f716047]
   27  /lib64/libc.so.6(__libc_start_main+0xf3) [0x7ff0e1f2ef73]
   28  /builddir/build/BUILD/legion-legion-19.04.0/openmpi/bin/rendering(_start+0x2e) [0x556a7f7160de]
===================
0% tests passed, 25 tests failed out of 25
Total Test time (real) =  11.09 sec
The following tests FAILED:
	  1 - attach_file (SEGFAULT)
	  2 - circuit (SEGFAULT)
	  3 - dynamic_registration (SEGFAULT)
	  4 - ghost (SEGFAULT)
	  5 - ghost_pull (SEGFAULT)
	  6 - realm_saxpy (SEGFAULT)
	  7 - realm_stencil (SEGFAULT)
	  8 - spmd_cgsolver (SEGFAULT)
	  9 - virtual_map (SEGFAULT)
	 10 - attach_2darray (SEGFAULT)
	 11 - attach_array_daxpy (SEGFAULT)
	 12 - mpi_interop (SEGFAULT)
	 13 - hello_world (SEGFAULT)
	 14 - tasks_and_futures (SEGFAULT)
	 15 - index_tasks (SEGFAULT)
	 16 - global_vars (SEGFAULT)
	 17 - logical_regions (SEGFAULT)
	 18 - physical_regions (SEGFAULT)
	 19 - privileges (SEGFAULT)
	 20 - partitioning (SEGFAULT)
	 21 - multiple_partitions (SEGFAULT)
	 22 - custom_mapper (SEGFAULT)
	 23 - attach_file_mini (SEGFAULT)
	 24 - test_stl (SEGFAULT)
	 25 - rendering (SEGFAULT)
BUILDSTDERR: Errors while running CTest

Details here and here: https://koji.fedoraproject.org/koji/taskinfo?taskID=34577005

This can be reproduced with the following Dockerfile:

FROM fedora:rawhide
RUN dnf install -y spectool wget rpm-build dnf-plugins-core
RUN wget https://src.fedoraproject.org/fork/junghans/rpms/legion/raw/master/f/legion.spec
RUN spectool -g legion.spec
RUN dnf builddep -y legion.spec
RUN dnf install -y make
RUN rpmbuild -D"_sourcedir ${PWD}" -D"_srcrpmdir ${PWD}" -ba legion.spec

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 23 (9 by maintainers)

Most upvoted comments

Ok, the original issue is fixed, now the circuit test fails on ppc64le:

     Start  2: circuit
 2/24 Test  #2: circuit ..........................Child aborted***Exception:   1.39 sec

Yes all test still pass. We will just wait for a OpenMPI-4 fix coming to rawhide.

Looking at https://apps.fedoraproject.org/koschei/package/legion, it seems openmpi-4 broke the build.