Skip to content

[package list]

lumi-CPEtools

License information

The lumi-CPEtools packages are developed by the LUMI User Support Team and licensed under the GNU General Public License version 3.0, a copy of which can be found in the LICENSE file in the source repository.

The hpcat tool included in the module is developed by HPE and licensed under an MIT-style license which can be found in the LICENSE file in the source repository.

After loading the module, both license files are available in their respective subdirectories in $EBROOTLUMIMINCPETOOLS/share/licenses.

User documentation (central installation)

Getting help

The tools in lumi-CPETtools are documented through manual pages that can be viewed on LUMI after loading the module. Start with man lumi-CPEtools.

Commands provided:

  • xldd: An ldd-like program to show which versions of Cray PE libraries are used by an executable.

  • serial_check: Serial program can print core and host allocation and affinity information.

  • omp_check: OpenMP program can print core and host allocation and affinity information.

  • mpi_check: MPI program can print core and host allocation and affinity information. It is also suitable to test heterogeneous jobs.

  • hybrid_check: Hybrid MPI/OpenMP program can print core and host allocation and affinity information. It is also suitable to test heterogeneous jobs. It encompasses the full functionality of serial_check, omp_check and mpi_check.

  • gpu_check (from version 1.1 on): A hybrid MPI/OpenMP program that prints information about thread and GPU binding/mapping on Cray EX Bardpeak nodes as in LUMI-G, based on the ORNL hello_jobstep program. (AMD GPU nodes only)

  • hpcat (from version 1.2 on): Another HPC Affinity Tracker program. This program is developed by HPE and shows for each MPI rank the core(s) that will be used (and per thread if OMP_NUM_THREADS is set), which GPU(s) are accessible to the task and which network adapter will be used, indicating the NUMA domain for each so that one can easily check if the resource mapping is ideal.

The various *_check programs are designed to test CPU and GPU binding in Slurm and are the LUST recommended way to experiment with those bindings.

Some interactive examples

The examples assume the appropriate software stack modules and lumi-CPEtools module are loaded. The examples show one version of modules, but can work with others too. You'll also need to add the appropriate -A flag to the salloc commands.

gpu_check

salloc -N2 -pstandard-g -G 16 -t 10:00
module load LUMI/24.03 partition/G lumi-CPEtools/1.2-cpeGNU-24.03
srun -n16 -c7 bash -c 'ROCR_VISIBLE_DEVICES=$SLURM_LOCALID gpu_check -l'
srun -n16 -c7 \
    --cpu-bind=mask_cpu:0xfe000000000000,0xfe00000000000000,0xfe0000,0xfe000000,0xfe,0xfe00,0xfe00000000,0xfe0000000000 \
    bash -c 'ROCR_VISIBLE_DEVICES=$SLURM_LOCALID gpu_check -l'

Note that in the first srun command, the mapping of GPU binding is not optimal while in the second srunz command it is.

hpcat on a GPU node

salloc -N2 -pstandard-g -G 16 -t 10:00
module load LUMI/24.03 partition/G lumi-CPEtools/1.2-cpeGNU-24.03
srun -n16 -c7 bash -c 'ROCR_VISIBLE_DEVICES=$SLURM_LOCALID OMP_NUM_THREADS=7 hpcat'
srun -n16 -c7 \
    --cpu-bind=mask_cpu:0xfe000000000000,0xfe00000000000000,0xfe0000,0xfe000000,0xfe,0xfe00,0xfe00000000,0xfe0000000000 \
    bash -c 'ROCR_VISIBLE_DEVICES=$SLURM_LOCALID OMP_NUM_THREADS=7 hpcat'
srun -n16 -c7 \
    --cpu-bind=mask_cpu:0xfe000000000000,0xfe00000000000000,0xfe0000,0xfe000000,0xfe,0xfe00,0xfe00000000,0xfe0000000000 \
    bash -c 'ROCR_VISIBLE_DEVICES=$SLURM_LOCALID OMP_NUM_THREADS=7 OMP_PLACES=cores hpcat'

Note that in the first srun command, the mapping of resources is not very good. GPUs don't map to their closest chiplet, and the network adapters are also linked based on the CPU NUMA domain. In the second case, the mapping is optimal, but except for the Cray compilers, the OpenMP threads can still move in the chiplet. In the last case, these are also fixed with all compilers.

serial_check, omp_check, mpi_cehck and hybrid_check

salloc -N1 -pstandard -t 10:00
module load LUMI/24.03 partition/C lumi-CPEtools/1.2-cpeGNU-24.03
srun -n1 -c1 serial_check
srun -n1 -c4 omp_check
srun -n4 -c1 mpi_check
srun -n4 -c4 hybrid_check

One big difference between these tools and hpcat is that this tool shows on which core a thread is running at the moment that this is measured, while hpcat actually shows the affinity mask, i.e., all cores that can be used by that thread. gpu_check has the same limitation as omp_check etc.

Acknowledgements

The code for hybrid_check and its offsprings serial_check, omp_check and mpi_check is inspired by the xthi program used in the 4-day LUMI comprehensive courses. The hybrid_check program has been used succesfully on other clusters also, also non-Cray or non-HPE clusters.

The gpu_check program (lumi-CPEtools 1.1 and later) builds upon the hello_jobstep code from ORNL. The program is specifically for the HPE Cray EX Bard Peak nodes and will not work correctly without reworking on other AMD GPU systems or on NVIDIA GPU systems.

The lumi-CPEtools code is developed by LUST in the lumi-CPEtools repository on the LUMI supercomputer GitHub.

The hpcat program (lumi-CPEtools 1.2 and later) is developed by HPE and provided unmodified.

Pre-installed modules (and EasyConfigs)

To access module help and find out for which stacks and partitions the module is installed, use module spider lumi-CPEtools/<version>.

EasyConfig:

User-installable modules (and EasyConfigs)

Install with the EasyBuild-user module:

eb <easyconfig> -r
To access module help after installation and get reminded for which stacks and partitions the module is installed, use module spider lumi-CPEtools/<version>.

EasyConfig:

Technical documentation (central installation)

lumi-CPEtools is developed by the LUST team.

EasyBuild

The EasyConfig is our own development as this is also our own tool. We provide full versions for each Cray PE, and a restricted version using the S?YSTEM toolchain for the CrayEnv software stack.

Version 1.0

  • The EasyConfig is our own design.

Version 1.1

  • The EasyConfig build upon the 1.0 one but with some important changes as there is now a tool that should only be installed in partition/G. So there are now makefile targets and additional variables for the Makefile.

  • For the recompile of 23.09 with ROCm 6 we needed to make the same changes as for 23.12, described below.

  • The cpeAMD version required changes to compile in 23.12:

    • The rocm module now needs to be loaded explicitly to have acces to the HIP runtime libraries and header files.

    • Needed to unload the accellerator module as we do use OpenMP but do not want to use OpenMP offload.

    • There is a problem when linking with the AMD compilers of code that uses ROCm libraries when LIBRARY_PATH is set.

  • It looks like the compiler wrappers have changed in 24.03 as unloading the accelerator target module in the cpeAMD version was no longer needed.

Version 1.2

  • Transformed the EasyConfig from version 1.1 to a Bundle to be able to add hpcat using its own installation procedure.

  • Building hpcat:

    • LUMI lacks the hwloc-devel package so we simply copied the header files from another system and download them from LUMI-O.

    • The Makefile was modified to integrate better with EasyBuild and to work around a problem with finding the hwloc library on LUMI.

      Rather than writing a new Makefile or a patch, we actually used a number of sed commands to edit the Makefile:

      • mpicc was replaced with $(CC) so that the wrappers are used instead.
      • -O3 was replaced with $(CFLAGS) to pick up the options from EasyBuild
      • '-fopenmp' is managed by the Makefile though and not by EasyBuild. On one hand because the ultimate goal is to integrate with another packages that sometimes needs and sometimes does not need the OpenMP flags, on the other hand to use $(CFLAGS) also for hipcc.
      • -lhwloc is replaced with -Wl,/usr/lib64/libhwloc.so.15. We had to do this through -Wl as the hipcc driver thought this was a source file.
      • As '-L.' is not needed, it is omitted.
    • As there is no make install, we simply use the MakeCp EasyBlock, doing the edits to the Makefile in prebuiltopts.

      Not that we copy the libhip.so file to the lib directory as that is the conventional place to store shared objects, but it is not found there by hpcat, so we also create a symbolic link to it in the bin subidrecitory.

    • Note that the accelerator target module should not be loaded when using the wrappers as the OpenMP offload options cause a problem in one of the header files used.

Technical documentation (user EasyBuild installation)

EasyBuild

Version 1.1 for Open MPI

  • The EasyConfigs are similar to those for the Cray MPICH versions, but

    • Compilers need to be set manually in buildopts to use the Open MPI compiler wrappers.

    • Before building some modules need to be unloaded again (which ones depends on the specific OpenMPI module).

Archived EasyConfigs

The EasyConfigs below are additonal easyconfigs that are not directly available on the system for installation. Users are advised to use the newer ones and these archived ones are unsupported. They are still provided as a source of information should you need this, e.g., to understand the configuration that was used for earlier work on the system.