Debugging at Scale – gdb4hpc, valgrind4hpc, ATP, stat¶
Presenter: Thierry Braconnier (HPE)
Tools discussed:
-
STAT: Stack Trace Analysis Tool (module
cray-stat
) -
ATP: Abnormal Termination Processing (module
atp
) -
GDB for HPC (module
gdb4hpc
). Works for CPU and GPU. -
Valgrind for HPC (module
valgrid4hpc
) -
Sanitizers for HPC (module
sanitizers4hpc
) -
CRAY_ACC_DEBUG
environment variable for CCE OpenaCC/OpenMP offload
Archived materials on LUMI:
-
Slides:
/appl/local/training/2p3day-20250303/files/LUMI-2p3day-20250303-306-Debugging_at_Scale.pdf
-
Recording:
/appl/local/training/2p3day-20250303/recordings/306-Debugging_at_Scale.mp4
These materials can only be distributed to actual users of LUMI (active user account).
Q&A¶
-
How gdb4hpc launch call maps to srun? I mean number of ranks per node and cpus per task?
- You can use
--launcher-args="--tasks-per-node=4"
for example to inject arguments that will be passed to srun in the launch command. If you start gdb4hpc and type "help launch" you can get more information on this.
- You can use