Exercises: Slurm on LUMI¶
Basic exercises¶
-
In this exercise we check how cores would be assigned to a shared memory program. Run a single task on the CPU partition with
srunusing 16 cpu cores. Inspect the default task allocation with thetasksetcommand (taskset -cp $$will show you the cpu numbers allocated to the current process).Click to see the solution.
srun --partition=small --nodes=1 --tasks=1 --cpus-per-task=16 --time=5 --account=<project_id> bash -c 'taskset -cp $$'Note that you need to replace
<project_id>with the actual project account ID of the formproject_plus a 9 digits number (and this argument can be omitted if you use theexercises/smallmodule during the course).The command runs a single process (
bashshell with the native Linuxtasksettool showing process's CPU affinity) on a compute node. You can use theman tasksetcommand to see how the tool works. -
Next we'll try a hybrid MPI/OpenMP program. For this we will use the
hybrid_checktool from thelumi-CPEtoolsmodule of the LUMI Software Library. This module is preinstalled on the system and has versions for all versions of theLUMIsoftware stack and all toolchains and partitions in those stacks.Use the simple job script below to run a parallel program with multiple tasks (MPI ranks) and threads (OpenMP). Submit with
sbatchon the CPU partition and check task and thread affinity.#!/bin/bash -l #SBATCH --partition=small # Partition (queue) name #SBATCH --nodes=1 # Total number of nodes #SBATCH --ntasks-per-node=8 # 8 MPI ranks per node #SBATCH --cpus-per-task=16 # 16 threads per task #SBATCH --time=5 # Run time (minutes) #SBATCH --account=<project_id> # Project for billing module load LUMI/24.03 module load lumi-CPEtools/1.2-cpeGNU-24.03 srun --cpus-per-task=$SLURM_CPUS_PER_TASK hybrid_check -n -rBe careful with copy/paste of the script body as copy problems with special characters or a double dash may occur, depending on the editor you use.
Click to see the solution.
Save the script contents into the file
job.sh(you can use thenanoconsole text editor for instance). Remember to use valid project account name (or omit the line if you are using theexercises/smallmodule).Submit the job script using the
sbatchcommand:sbatch job.shThe job output is saved in the
slurm-<job_id>.outfile. You can view its content with either thelessormoreshell commands.The actual task/threads affinity may depend on the specific OpenMP runtime (if you literally use this job script it will be the GNU OpenMP runtime).
Advanced exercises¶
These exercises combine material from several chapters of the tutorial. This particular exercise makes most sense if you will be building software on LUMI (but remember that this will be more of you than you may expect)!
-
Build the
hello_jobstepprogram tool using interactive shell on a GPU node. You can pull the source code for the program from git repositoryhttps://code.ornl.gov/olcf/hello_jobstep.git. It uses aMakefilefor building and requires Clang and HIP. The code also contains aREADME.mdfile with instructions, but they will need some changes. Thehello_jobstepprogram is actually the main source of inspiration for thegpu_checkprogram in thelumi-CPEtoolsmodules forpartition/G. Try to run the program interactively.The
Makefilecontains a conditional section to set proper arguments for the compiler. LUMI is very similar to Frontier, so when calling themakeprogram to build the code from the Makefile, don't use simplymakeas suggested in theREADME.md, but usemake LMOD_SYSTEM_NAME="frontier"Note: Given that the reservation is on
standard-gwhere you can only get whole nodes, which is rather stupid for this example, it is better to try to get a single GPU with 7 cores and 60GB of memory on thesmall-gpartition.Click to see the solution.
Clone the code using
gitcommand:git clone https://code.ornl.gov/olcf/hello_jobstep.gitIt will create
hello_jobstepdirectory consisting source code andMakefile.Allocate resources for a single task with a single GPU with
salloc:salloc --partition=small-g --tasks=1 --cpus-per-task=7 --gpus=1 --mem=60G --time=10 --account=<project_id>Note that, after allocation is granted, you receive new shell but are still on the compute node. You need to use the
sruncommand to run on the allocated node.Start interactive session on a GPU node:
srun --pty bash -iNote now you are on the compute node.
--ptyoption forsrunis required to interact with the remote shell.Enter the
hello_jobstepdirectory. There is a Makefile to build the code using themakecommand, but first we need to make sure that there is a proper programming environment loaded.As an example we will built with the system default programming environment,
PrgEnv-crayinCrayEnv. Just to be sure we'll load even the programming environment module explicitly.The build will fail if the
rocm/6.0.3module is not loaded when usingPrgEnv-cray. Whereas the instructions suggest to simply use therocmmodule, we're specifying a version as at the time of the course, there was a newer but not fully supportedrocmmodule on the system.module load CrayEnv module load PrgEnv-cray module load rocm/6.0.3Alternatively, you can build in the
LUMI/24.03stack using the EasyBuild toolchain instead ofPrgEnv-cray:module load LUMI/24.03 partition/G cpeCrayIn this case, you do not need to load the ROCm module as it is loaded automatically by
cpeCray/24.03when working inpartition/G.To build the code, use
make LMOD_SYSTEM_NAME="frontier"You need to add
LMOD_SYSTEM_NAME="frontier"variable for make as the code originates from the Frontier system and doesn't know LUMI.(As an exercise you can try to fix the
Makefileand enable it for LUMI :))Finally you can just execute
./hello_jobstepbinary program to see how it behaves:./hello_jobstepNote that executing the program with
srunin the srun interactive session will result in a hang. You need to work with--overlapoption for srun to mitigate this.Remember to terminate your interactive session with
exitcommand.exitand then do the same for the shell created by
sallocalso.