pyfluent: Error when running PyFluent in HPC/Slurm environment

🔍 Before submitting the issue

  • I have searched among the existing issues
  • I am using a Python virtual environment

🐞 Description of the bug

So I’m following the example of using PyFluent with a scheduler on an HPC machine (https://fluent.docs.pyansys.com/version/stable/user_guide/launching_ansys_fluent.html#scheduler-support). I specify the path where FLUENT is installed (tried different paths as well) but no luck.

The way that I’m launching fluent from the python script is: solver = launch_fluent(mode="solver", precision='double', show_gui=False, gpu=True, start_timeout=60) Bellow are the bash file and the slurm-out file.

📝 Steps to reproduce

bash file:

#!/bin/bash #SBATCH --job-name=FLUENT-2023R2-gpu-case #SBATCH --partition=ampere #SBATCH --gres=gpu:1 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --time=2:00:00

module load intel-oneapi-compilers/2022.0.2-sygdcrc python/3.10.10-tuelrz module load ansys/2023R2

#CREATE HOSTFILE echo “Creating hostfile…” srun hostname > ${SLURM_JOBID}.hostfile

#ACTIVATE VENV echo “Activating virtual environment…” . ./myenv/bin/activate

#EXPORT FLUENT SO IT CAN BE FOUND echo “Exporting AWP_ROOT232…” export AWP_ROOT232=/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/

#RUN PYFLUENT echo “Running PyFluent…” python benchmark_wing.py

#AS SOON AS RUN IS COMPLETE, DELETE HOSTFILE echo “Cleaning up hostfile…” rm -f ${SLURM_JOBID}.hostfile

Slurm Out file:

remove ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) load ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) Creating hostfile… Exporting AWP_ROOT232… Activating virtual environment… Running PyFluent… pyfluent.launcher ERROR: Exception caught - RuntimeError: The launch process has been timed out. Trying to open solver Traceback (most recent call last): File “/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py”, line 758, in launch_fluent raise ex File “/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py”, line 742, in launch_fluent _await_fluent_launch( File “/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py”, line 383, in _await_fluent_launch raise RuntimeError(“The launch process has been timed out.”) RuntimeError: The launch process has been timed out.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File “/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/benchmark_wing.py”, line 8, in <module> solver = launch_fluent(mode=“solver”, precision=‘double’, show_gui=False, gpu=True, start_timeout=60) File “/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py”, line 797, in launch_fluent raise LaunchFluentError(launch_cmd) from ex ansys.fluent.core.launcher.launcher.LaunchFluentError: Fluent Launch string: /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/bin/fluent 3ddp -gpu -t20 -cnf=cn50:20 -sifile=/tmp/serverinfo-w7rr7m0a.txt -nm -hidden Cleaning up hostfile…

💻 Which operating system are you using?

Linux

📀 Which ANSYS version are you using?

FLUENT 2023R2

🐍 Which Python version are you using?

3.10

📦 Installed packages

ansys-fluent core

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 16 (5 by maintainers)

Most upvoted comments

@dnwillia @raph-luc The -hidden flag is used as default to launch Fluent in both platforms since beginning (PR #7). I think at that time there was some issue in displaying/saving postprocessing images in -gu mode in linux which I cannot reproduce now. We can try to switch the default to -gu now if we don’t find any other issue (I’ll have a look into this).

@raph-luc What the user has in mind is just simply running Fluent within the scheduler environment in batch, no interaction from a remote python console necessary. This is pretty common way to use SLURM and is why I added the support documented in the link that @christospliakos provided.

@christospliakos the documented example should be working, so no need for updating the documentation. What happens if you just start with minimal args:

solver = launch_fluent(mode="solver", precision='double')

Does that work? I must admit, I’ve never tried this with the gpu argument. Also, is there any information in the standard output or do you happen to be writing a Fluent transcript file we could have a look at?

@raph-luc It seems that timeout is not the reason. I usually pass the queue instantly and I raised the timeout to 5mins.

I would like the Slurm Job to control PyFluent since in the job I specify the usage of 1 GPU unit (with the bash file). Only then do I get access to the GPU. For your solution of: “letting PyFluent manage the Fluent Slurm job” do I need 1 job to run the PyFluent script that creates another job for the actual Fluent to work?

The way it works is very confusing imho. The way that is described in the old documentation is very intuitive. A single job that runs PyFluent manages fluent launch with the available computational power which is dictated by the job initialization.

As I’m still navigating my way through HPC/Slurm, a complete and updated documentation with a full example would be very helpful.