ADIOS2: ADIOS2 v2.7.1: Spack Build on Summit Breaks IB

Hi,

I am using the E4S Spack package of ADIOS2 v2.7.1 on Summit (/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/). We are really happy about that opportunity, because it saves us from having to maintain and compile ADIOS2 ourselves for the whole WarpX & PIConGPU user base, which eases our support burden a lot and gives us a fast-track to new ADIOS2 releases for production runs.

Unfortunately, I think that some components of ADIOS2 forget to declare infiniband (IB) / libfabric or MPI-related dependencies, because the moment I link ADIOS2 to my application (ECP WarpX), I get the following errors on startup of our app, even if I don’t use any ADIOS functionality:

Number of active IB device ports not supported
[g12n09:264854] Error: common_pami.c:1094 - ompi_common_pami_init() 0: Unable to create 1 PAMI communication context(s) rc=1
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      g12n09
  Framework: pml
--------------------------------------------------------------------------
[g12n09:264854] PML pami cannot be selected

I also opened an OLCF ticket for this (OLCFHELP-3319) and linked to here for coordination.

In parallel, it would be perfect if you @pnorbert @chuckatkins @eisenhauer @williamfgc @sklasky et al. can coordinate with OLCF support, your Spack package and E4S to make sure the SST component’s CMake logic in combination with the depends_on declarations in your Spack package declare the right dependencies.

Furthermore, the Spack ADIOS2 package could declare simple acceptance tests (built-time checks and smoke-tests) to make sure system people can automatically validate your compiled package really works. Those tests can be a small sub-set of your regular ctests to check MPI et al. truly works.

References

ldd libadios2_cxx11_mpi.so

I noticed that libevent is picked up from the system and not declared as depends_on in Spack. Maybe that causes this?

Also I found libcrypto.so and libz.so are not declared dependencies and picked up from the system for /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libmpiprofilesupport.so.3

$ ldd $OLCF_ADIOS2_ROOT/lib64/libadios2_cxx11_mpi.so                                
	linux-vdso64.so.1 (0x0000200000050000)
	libadios2_cxx11.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_cxx11.so.2 (0x00002000000a0000)
	libadios2_core_mpi.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core_mpi.so.2 (0x0000200000260000)
	libadios2_core.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core.so.2 (0x00002000003f0000)
	libpthread.so.0 => /lib64/power9/libpthread.so.0 (0x0000200000b20000)
	libmpiprofilesupport.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libmpiprofilesupport.so.3 (0x0000200000b70000)
	libmpi_ibm.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libmpi_ibm.so.3 (0x0000200000ba0000)
	libstdc++.so.6 => /autofs/nccs-svm1_sw/summit/gcc/9.3.0-2/lib64/libstdc++.so.6 (0x0000200000d60000)
	libm.so.6 => /lib64/power9/libm.so.6 (0x0000200000ff0000)
	libgcc_s.so.1 => /autofs/nccs-svm1_sw/summit/gcc/9.3.0-2/lib64/libgcc_s.so.1 (0x0000200001120000)
	libc.so.6 => /lib64/power9/libc.so.6 (0x0000200001160000)
	libadios2_taustubs.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_taustubs.so (0x0000200001370000)
	libblosc.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/c-blosc-1.21.0-s6xjxljpolgxpoh73tuwgjhvtcy2fgh4/lib/libblosc.so.1 (0x00002000013a0000)
	libbz2.so.1.0 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/bzip2-1.0.8-rgx7teyivl2lbzzznj3vyy5j64mw3n7g/lib/libbz2.so.1.0 (0x00002000013d0000)
	libzfp.so.0 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/zfp-0.5.5-va2z6zttx57egp3n4c4sddhx2mp6ysc4/lib64/libzfp.so.0 (0x0000200001410000)
	libSZ.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/sz-2.1.11.1-7hgfrsk5r7e7yoyupgiiaprbh763igee/lib64/libSZ.so (0x0000200001470000)
	libpng16.so.16 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libpng-1.6.37-q2i4njub2bbhfheqpqjcwjbta6az5ret/lib/libpng16.so.16 (0x0000200001580000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00002000015f0000)
	libfabric.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libfabric-1.12.1-fcjy5sxzwx2fiflbpirvo5qzwd73pa5p/lib/libfabric.so.1 (0x0000200001620000)
	libadios2_evpath.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_evpath.so (0x0000200001730000)
	libadios2_ffs.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_ffs.so.1 (0x00002000017e0000)
	libadios2_atl.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_atl.so.2 (0x0000200001880000)
	libz.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/zlib-1.2.11-5ylwliurlfdizyqq2h2ujwdvl44ukaef/lib/libz.so.1 (0x00002000018b0000)
	libzstd.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/zstd-1.5.0-j6742s2uqme6gcdwuxhwbqajmbmuwpki/lib64/libzstd.so.1 (0x00002000018f0000)
	/lib64/ld64.so.2 (0x0000200000000000)
	librt.so.1 => /lib64/power9/librt.so.1 (0x00002000019f0000)
	libutil.so.1 => /lib64/libutil.so.1 (0x0000200001a20000)
	libhwloc_ompi.so.15 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libhwloc_ompi.so.15 (0x0000200001a50000)
	libevent_core-2.1.so.6 => /lib64/libevent_core-2.1.so.6 (0x0000200001ae0000)
	libevent_pthreads-2.1.so.6 => /lib64/libevent_pthreads-2.1.so.6 (0x0000200001b70000)
	libopen-rte.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libopen-rte.so.3 (0x0000200001ba0000)
	libopen-pal.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libopen-pal.so.3 (0x0000200001cc0000)
	liblz4.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/lz4-1.9.3-vfbfbfgzir4xbt3nlmbbkbl2rgyfkngq/lib/liblz4.so.1 (0x0000200001e20000)
	libgomp.so.1 => /autofs/nccs-svm1_sw/summit/gcc/9.3.0-2/lib64/libgomp.so.1 (0x0000200001e80000)
	librdmacm.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/rdma-core-34.0-grvfqvzut26nwzs5yhfku42hqeolx63k/lib64/librdmacm.so.1 (0x0000200001ef0000)
	libibverbs.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/rdma-core-34.0-grvfqvzut26nwzs5yhfku42hqeolx63k/lib64/libibverbs.so.1 (0x0000200001f30000)
	libatomic.so.1 => /sw/summit/gcc/9.3.0-2/lib/../lib64/libatomic.so.1 (0x0000200001f80000)
	libadios2_dill.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/../lib64/libadios2_dill.so.2 (0x0000200001fb0000)
	libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x0000200002010000)
	libnl-3.so.200 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libnl-3.3.0-2b2gen3crztivrqz4rnyafw5mpjib7cv/lib/libnl-3.so.200 (0x0000200002380000)
	libnl-route-3.so.200 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libnl-3.3.0-2b2gen3crztivrqz4rnyafw5mpjib7cv/lib/libnl-route-3.so.200 (0x00002000023d0000)
	libffi.so.7 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libffi-3.3-vgzin2dcgkwyvp5nq75ioktj7adpw35w/lib64/libffi.so.7 (0x0000200002480000)

ldd warpx

$ ldd build/bin/warpx | grep -i adios
	libadios2_cxx11_mpi.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/libadios2_cxx11_mpi.so.2 (0x00007fffb3130000)
	libadios2_cxx11.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/libadios2_cxx11.so.2 (0x00007fffb2f70000)
	libadios2_core_mpi.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core_mpi.so.2 (0x00007fffb2690000)
	libadios2_core.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core.so.2 (0x00007fffb1f80000)
	libadios2_taustubs.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_taustubs.so (0x00007fffb1b70000)
	libadios2_evpath.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_evpath.so (0x00007fffb1740000)
	libadios2_ffs.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_ffs.so.1 (0x00007fffb16a0000)
	libadios2_atl.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_atl.so.2 (0x00007fffb1670000)
	libadios2_dill.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/../lib64/libadios2_dill.so.2 (0x00007fffb1010000)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@chuckatkins @ax3l Following up on the OLCFHELP-3319 ticket, we put a module libfabric/1.12.1-sysrdma on summit yesterday afternoon that uses the system RDMA-core and have re-built our adios2/2.7.1 modules against it. If you try it out, please update the OLCF ticket with any other issues you find.

I assume I (user) do not need to load libfabric/1.12.1-sysrdma and am good to just load the updated adios2/2.7.1 module, right?

Correct - in theory, it should not be necessary to load any libfabric module; only adios2/2.7.1. If you find otherwise, let us know.

Thanks a lot for the quick suggestion @chuckatkins! Just checked - it looks like the module does not exist anymore after the system upgrade to RHEL8.

An easy workaround should be to just use LD_PRELOAD:

$ export LD_PRELOAD="/usr/lib64/libibverbs.so.1:/usr/lib64/librdmacm.so.1"

If a custom Summit-specific libfabric is the solution … Just curious: is that an IBM or Mellanox fork of libfabric on Summit? Or were the changes/fixes upstreamed?

The problem is a mismatch in the rdma-core libraries used by the IBM-provided Spectrum MPI RPM and the spack-provided libfabric. Spectrum MPI is built against the system-provided rdma-core package while spack’s libfabric is built against the spack-provided rdma-core package. The result is that you end up with either MPI or libfabric working but not both in the same build. Setting LD_PRELOAD to the system rdma-core libraries will ensure that those ones get used at runtime by spack’s libfabric (the converse causes Spectrum MPI to segfault).

This is a known problem on summit and a mismatch between the rdma-core libraries used by spectrum-mpi and spack’s libfabric. I believe the admins built a specific module for us to fix this. Try using the libfabric/1.7.0-sysrdma module instead of the default libfabric.