mpi4py: MPICommExecutor & MPIPoolExecutor Freeze Indefinitely

Architecture: Power9 (Summit Super Computer)

MPI Version: Package: IBM Spectrum MPI Spectrum MPI: 10.4.0.03rtm4 Spectrum MPI repo revision: IBM_SPECTRUM_MPI_10.04.00.03_2021.01.12_RTM4 Spectrum MPI release date: Unreleased developer copy

MPI4py Version: 3.1.1

Reproduce Script:

from mpi4py.futures import MPICommExecutor
from mpi4py import MPI
import time
import os

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

# square the numbers
def apply_fun(i):
    print('running apply!', flush=True)
    return i**2*rank

print('pid:',os.getpid(), flush=True)
print('rank:', rank, flush=True)

# this *does* implement map-reduce and supposedly works on legacy systems without dynamic process management
# (I've gotten it working with `jsrun -n 1` but so far no luck with multiple processes)
# see the docs: https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html?highlight=MPICommExecutor#mpicommexecutor
with MPICommExecutor(MPI.COMM_WORLD, root=0) as executor:
    if executor is not None:
        print('Executor started from root!', flush=True)
        answer = list(executor.map(apply_fun, range(41)))
        print('pid: ',os.getpid(),'rank:',rank, answer, flush=True)

jsrun python mpi_test.py output:

Warning: OMP_NUM_THREADS=16 is greater than available PU's
Warning: OMP_NUM_THREADS=16 is greater than available PU's
pid: 1448946
rank: 1
pid: 1448945
rank: 0
Executor started from root!
running apply!

Then indefinite freeze. Btw jsrun is summit’s ‘custom version’ of mpirun/mpiexec and it works really well in general (in contrast to mpirun & mpiexec). Also with this exact same setup I had no problem using MPI.gather() & MPI.scatter() it is just the Executors which don’t work which troublesome because I really like the map-based API.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 26 (13 by maintainers)

Most upvoted comments

Intel is basically telling me to “go away” if it’s not reproducible with OneAPI, so that’s the only thing I’m trying these days. Yes, plenty of spawn problems.