Exercises 4: Running jobs with Slurm¶
Intro¶
For these exercises, you'll need to take care of some settings:
-
For the CPU exercises we advise to use the
small
partition and for the exercises on GPU thestandard-g
partition. -
During the course you can use the course training project
project_465001603
for these exercises. A few days after the course you will need to use a different project on LUMI. -
On December 11 we have a reservation that you can use (through
#SBATCH --reservation=...
):-
For the
small
partition, the reservation name isLUMI_Intro_small
-
For the
standard-g
partition, the reservation name isLUMI_Intro_standard-g
-
An alternative (during the course only) for manually specifying the account, the partition and the reservation, is to set them through modules. For this, first add an additional directory to the module search path:
module use /appl/local/training/modules/2day-20241210
and then you can load either the module exercises/small
or exercises/standard-g
.
Check what these modules do...
Try, e.g.,
module show exercises/small
to get an idea of what these modules do. Can you see which environment variables they set?
Exercises¶
- Start with the exercises on "Slurm on LUMI"
- Proceed with the exercises on "Process and Thread Distribution and Binding"
Q&A¶
-
Question related to Hybrid MPI/OpenMP:
ntasks-per-node
defines the number of cores or MPI ranks andcpus-per-task
defines number of threads per task or per rank? You mentioned that there are two threads per core so how come 16 threads per task is possible? May be I am missing something!ntasks-per-node
does exactly what it says: Tasks per node, so MPI ranks in most cases. It says nothing about cores available to each MPI rank (default is 1).cpus-per-task
then says how many cores should be reserved for each task.
-
So in case of
nodes=1
,ntasks-per-node = 8
andcpus-per-task=16
how many cpu hours will be billed if this runs for 1 hour?- In this case it is easy because you are filling entire nodes on LUMI-C, so the amount of memory that you consume is not a factor. It will be 128 CPU hours per hour that the job runs.
-
For adjusting the
Makefile
for lumi, does one need to add the correspondingCFLAGS
line in theMakefile
?- That depends. There are other ways to set environment variables that a Makefile can use. And it depends on how the rules are defined in the Makefile. Is it for one of the exercises of this course?
Yes it is the advanced exercise I was trying.
- They didn't specify an optimisation level while it is typically safer to add that, but apart from that the options are OK for the Cray compiler (
PrgEnv-cray
). For this particular program, which runs in a fraction of a second, optimisation level is of course not that important...
However, to follow the exercise to make this for
lumi
, does one need to replace-D__HIP_ARCH_GFX90A__
flag to correspondingrocm
architecture?-
No, that is why I asked to compile with
make LMOD_SYSTEM_NAME="frontier"
, as that should enable the proper line in the Makefile to do that. The test on line 18 should then ensure that the code on line 19 is executed which adds the proper flags. Frontier has almost exactly the same node type as LUMI, only 3 times more... -
Just to be more clear, the gpu architecture on lumi is the GFX90A