mpi4py: MPICommExecutor & MPIPoolExecutor Freeze Indefinitely
Architecture: Power9 (Summit Super Computer)
MPI Version: Package: IBM Spectrum MPI Spectrum MPI: 10.4.0.03rtm4 Spectrum MPI repo revision: IBM_SPECTRUM_MPI_10.04.00.03_2021.01.12_RTM4 Spectrum MPI release date: Unreleased developer copy
MPI4py Version: 3.1.1
Reproduce Script:
from mpi4py.futures import MPICommExecutor
from mpi4py import MPI
import time
import os
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
# square the numbers
def apply_fun(i):
    print('running apply!', flush=True)
    return i**2*rank
print('pid:',os.getpid(), flush=True)
print('rank:', rank, flush=True)
# this *does* implement map-reduce and supposedly works on legacy systems without dynamic process management
# (I've gotten it working with `jsrun -n 1` but so far no luck with multiple processes)
# see the docs: https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html?highlight=MPICommExecutor#mpicommexecutor
with MPICommExecutor(MPI.COMM_WORLD, root=0) as executor:
    if executor is not None:
        print('Executor started from root!', flush=True)
        answer = list(executor.map(apply_fun, range(41)))
        print('pid: ',os.getpid(),'rank:',rank, answer, flush=True)
jsrun python mpi_test.py output:
Warning: OMP_NUM_THREADS=16 is greater than available PU's
Warning: OMP_NUM_THREADS=16 is greater than available PU's
pid: 1448946
rank: 1
pid: 1448945
rank: 0
Executor started from root!
running apply!
Then indefinite freeze. Btw jsrun is summit’s ‘custom version’ of mpirun/mpiexec and it works really well in general (in contrast to mpirun & mpiexec). Also with this exact same setup I had no problem using MPI.gather() & MPI.scatter() it is just the Executors which don’t work which troublesome because I really like the map-based API.
About this issue
- Original URL
 - State: open
 - Created 2 years ago
 - Comments: 26 (13 by maintainers)
 
Intel is basically telling me to “go away” if it’s not reproducible with OneAPI, so that’s the only thing I’m trying these days. Yes, plenty of spawn problems.