ADIOS2: ADIOS2 v2.7.1: Spack Build on Summit Breaks IB
Hi,
I am using the E4S Spack package of ADIOS2 v2.7.1 on Summit (/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/). We are really happy about that opportunity, because it saves us from having to maintain and compile ADIOS2 ourselves for the whole WarpX & PIConGPU user base, which eases our support burden a lot and gives us a fast-track to new ADIOS2 releases for production runs.
Unfortunately, I think that some components of ADIOS2 forget to declare infiniband (IB) / libfabric or MPI-related dependencies, because the moment I link ADIOS2 to my application (ECP WarpX), I get the following errors on startup of our app, even if I don’t use any ADIOS functionality:
Number of active IB device ports not supported
[g12n09:264854] Error: common_pami.c:1094 - ompi_common_pami_init() 0: Unable to create 1 PAMI communication context(s) rc=1
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.
This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.
Host: g12n09
Framework: pml
--------------------------------------------------------------------------
[g12n09:264854] PML pami cannot be selected
I also opened an OLCF ticket for this (OLCFHELP-3319) and linked to here for coordination.
In parallel, it would be perfect if you @pnorbert @chuckatkins @eisenhauer @williamfgc @sklasky et al. can coordinate with OLCF support, your Spack package and E4S to make sure the SST component’s CMake logic in combination with the depends_on declarations in your Spack package declare the right dependencies.
Furthermore, the Spack ADIOS2 package could declare simple acceptance tests (built-time checks and smoke-tests) to make sure system people can automatically validate your compiled package really works. Those tests can be a small sub-set of your regular ctests to check MPI et al. truly works.
References
- Spack’s ADIOS2 package.py
- Two examples to add check/smoke-tests: openPMD-api or WarpX
- OLCF Ticket: OLCFHELP-3319 (I named your emails so they can CC you and you can coordinate with them)
ldd libadios2_cxx11_mpi.so
I noticed that libevent is picked up from the system and not declared as depends_on in Spack. Maybe that causes this?
Also I found libcrypto.so and libz.so are not declared dependencies and picked up from the system for /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libmpiprofilesupport.so.3
$ ldd $OLCF_ADIOS2_ROOT/lib64/libadios2_cxx11_mpi.so
linux-vdso64.so.1 (0x0000200000050000)
libadios2_cxx11.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_cxx11.so.2 (0x00002000000a0000)
libadios2_core_mpi.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core_mpi.so.2 (0x0000200000260000)
libadios2_core.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core.so.2 (0x00002000003f0000)
libpthread.so.0 => /lib64/power9/libpthread.so.0 (0x0000200000b20000)
libmpiprofilesupport.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libmpiprofilesupport.so.3 (0x0000200000b70000)
libmpi_ibm.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libmpi_ibm.so.3 (0x0000200000ba0000)
libstdc++.so.6 => /autofs/nccs-svm1_sw/summit/gcc/9.3.0-2/lib64/libstdc++.so.6 (0x0000200000d60000)
libm.so.6 => /lib64/power9/libm.so.6 (0x0000200000ff0000)
libgcc_s.so.1 => /autofs/nccs-svm1_sw/summit/gcc/9.3.0-2/lib64/libgcc_s.so.1 (0x0000200001120000)
libc.so.6 => /lib64/power9/libc.so.6 (0x0000200001160000)
libadios2_taustubs.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_taustubs.so (0x0000200001370000)
libblosc.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/c-blosc-1.21.0-s6xjxljpolgxpoh73tuwgjhvtcy2fgh4/lib/libblosc.so.1 (0x00002000013a0000)
libbz2.so.1.0 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/bzip2-1.0.8-rgx7teyivl2lbzzznj3vyy5j64mw3n7g/lib/libbz2.so.1.0 (0x00002000013d0000)
libzfp.so.0 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/zfp-0.5.5-va2z6zttx57egp3n4c4sddhx2mp6ysc4/lib64/libzfp.so.0 (0x0000200001410000)
libSZ.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/sz-2.1.11.1-7hgfrsk5r7e7yoyupgiiaprbh763igee/lib64/libSZ.so (0x0000200001470000)
libpng16.so.16 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libpng-1.6.37-q2i4njub2bbhfheqpqjcwjbta6az5ret/lib/libpng16.so.16 (0x0000200001580000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002000015f0000)
libfabric.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libfabric-1.12.1-fcjy5sxzwx2fiflbpirvo5qzwd73pa5p/lib/libfabric.so.1 (0x0000200001620000)
libadios2_evpath.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_evpath.so (0x0000200001730000)
libadios2_ffs.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_ffs.so.1 (0x00002000017e0000)
libadios2_atl.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_atl.so.2 (0x0000200001880000)
libz.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/zlib-1.2.11-5ylwliurlfdizyqq2h2ujwdvl44ukaef/lib/libz.so.1 (0x00002000018b0000)
libzstd.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/zstd-1.5.0-j6742s2uqme6gcdwuxhwbqajmbmuwpki/lib64/libzstd.so.1 (0x00002000018f0000)
/lib64/ld64.so.2 (0x0000200000000000)
librt.so.1 => /lib64/power9/librt.so.1 (0x00002000019f0000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000200001a20000)
libhwloc_ompi.so.15 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libhwloc_ompi.so.15 (0x0000200001a50000)
libevent_core-2.1.so.6 => /lib64/libevent_core-2.1.so.6 (0x0000200001ae0000)
libevent_pthreads-2.1.so.6 => /lib64/libevent_pthreads-2.1.so.6 (0x0000200001b70000)
libopen-rte.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libopen-rte.so.3 (0x0000200001ba0000)
libopen-pal.so.3 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/spectrum-mpi-10.4.0.3-20210112-2s7kpbzydf6val7k2d3e6cz3zdhtcwlw/lib/libopen-pal.so.3 (0x0000200001cc0000)
liblz4.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/lz4-1.9.3-vfbfbfgzir4xbt3nlmbbkbl2rgyfkngq/lib/liblz4.so.1 (0x0000200001e20000)
libgomp.so.1 => /autofs/nccs-svm1_sw/summit/gcc/9.3.0-2/lib64/libgomp.so.1 (0x0000200001e80000)
librdmacm.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/rdma-core-34.0-grvfqvzut26nwzs5yhfku42hqeolx63k/lib64/librdmacm.so.1 (0x0000200001ef0000)
libibverbs.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/rdma-core-34.0-grvfqvzut26nwzs5yhfku42hqeolx63k/lib64/libibverbs.so.1 (0x0000200001f30000)
libatomic.so.1 => /sw/summit/gcc/9.3.0-2/lib/../lib64/libatomic.so.1 (0x0000200001f80000)
libadios2_dill.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/../lib64/libadios2_dill.so.2 (0x0000200001fb0000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x0000200002010000)
libnl-3.so.200 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libnl-3.3.0-2b2gen3crztivrqz4rnyafw5mpjib7cv/lib/libnl-3.so.200 (0x0000200002380000)
libnl-route-3.so.200 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libnl-3.3.0-2b2gen3crztivrqz4rnyafw5mpjib7cv/lib/libnl-route-3.so.200 (0x00002000023d0000)
libffi.so.7 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/libffi-3.3-vgzin2dcgkwyvp5nq75ioktj7adpw35w/lib64/libffi.so.7 (0x0000200002480000)
ldd warpx
$ ldd build/bin/warpx | grep -i adios
libadios2_cxx11_mpi.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/libadios2_cxx11_mpi.so.2 (0x00007fffb3130000)
libadios2_cxx11.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/libadios2_cxx11.so.2 (0x00007fffb2f70000)
libadios2_core_mpi.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core_mpi.so.2 (0x00007fffb2690000)
libadios2_core.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/libadios2_core.so.2 (0x00007fffb1f80000)
libadios2_taustubs.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_taustubs.so (0x00007fffb1b70000)
libadios2_evpath.so => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_evpath.so (0x00007fffb1740000)
libadios2_ffs.so.1 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_ffs.so.1 (0x00007fffb16a0000)
libadios2_atl.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/libadios2_atl.so.2 (0x00007fffb1670000)
libadios2_dill.so.2 => /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.3.0/adios2-2.7.1-vuplpxvgpbby2oe4sgpbtarlrg55ofxl/lib64/../lib64/../lib64/../lib64/libadios2_dill.so.2 (0x00007fffb1010000)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (13 by maintainers)
Commits related to this issue
- Docs: OLCF ADIOS2 Currently Broken Tickets open: - OLCFHELP-3319 - https://github.com/ornladios/ADIOS2/issues/2836 — committed to ECP-WarpX/WarpX by ax3l 3 years ago
- Merge upstream development (#58) * AMReX/PICSAR: Weekly Update (#2199) Weekly update to latest AMReX. No changes in PICSAR. * do_pml should not be parsed anymore. (#2183) * do_pml not parse... — committed to ModernElectron/WarpX by roelof-groenewald 3 years ago
- Merge dev aug26 (#34) * Rename: Optical Depths QED (#2140) This is renaming the runtime added optical depth scalars for QED physics just to create the same names in plotfile and openPMD output. ... — committed to AMReX-Microelectronics/artemis by RevathiJambunathan 3 years ago
- Adds system RDMA-core external to Summit Snapshots environment prior to reconcretizing environment for system RDMA-core external package needed for libfabric to address issues with Adios2: https://g... — committed to mpbelhorn/olcf-spack-environments by mpbelhorn 3 years ago
- Update with development (#42) * Rename: Optical Depths QED (#2140) This is renaming the runtime added optical depth scalars for QED physics just to create the same names in plotfile and openPMD o... — committed to AMReX-Microelectronics/artemis by RevathiJambunathan 3 years ago
@chuckatkins @ax3l Following up on the OLCFHELP-3319 ticket, we put a module
libfabric/1.12.1-sysrdmaon summit yesterday afternoon that uses the system RDMA-core and have re-built ouradios2/2.7.1modules against it. If you try it out, please update the OLCF ticket with any other issues you find.Correct - in theory, it should not be necessary to load any
libfabricmodule; onlyadios2/2.7.1. If you find otherwise, let us know.An easy workaround should be to just use
LD_PRELOAD:The problem is a mismatch in the
rdma-corelibraries used by the IBM-provided Spectrum MPI RPM and the spack-provided libfabric. Spectrum MPI is built against the system-provided rdma-core package while spack’s libfabric is built against the spack-provided rdma-core package. The result is that you end up with either MPI or libfabric working but not both in the same build. SettingLD_PRELOADto the system rdma-core libraries will ensure that those ones get used at runtime by spack’s libfabric (the converse causes Spectrum MPI to segfault).This is a known problem on summit and a mismatch between the rdma-core libraries used by spectrum-mpi and spack’s libfabric. I believe the admins built a specific module for us to fix this. Try using the libfabric/1.7.0-sysrdma module instead of the default libfabric.