Debugging at Scale – gdb4hpc, valgrind4hpc, ATP, stat¶
Presenter: Thierry Braconnier (HPE)
Archived materials on LUMI:
-
Slides:
/appl/local/training/4day-20231003/files/LUMI-4day-20231003-2_03_Debugging_at_Scale.pdf
-
Recording:
/appl/local/training/4day-20231003/files/LUMI-4day-20231003-2_03_Debugging_at_Scale.pdf
These materials can only be distributed to actual users of LUMI (active user account).
Q&A¶
-
Sometimes my jobs hang for a long time at start before anything apparently happens. Is
cray-stat
the right tool to debug this?- gdb4hpc and attach to it
-
How to properly load a coredump from a Singularity container using
rocgdb
?- no sure how to answer to this question... I assume you have the debugging symbols of the applications, so could you open a shell and mount rocgdb in the container to open the core file?
-
What is the difference between the sanitizer tools and the standard debugging tools offered by the CCE that we discussed yesterday (e.g. options like -h bounds , etc etc)?
-
OK, those checks at runtime are related to some special cases (like bound checking). The asan (address sanitezer) offers more checks (check the corresponding webpages, e.g. https://clang.llvm.org/docs/AddressSanitizer.html )
-
The sanizers are in all compilers nowadays, sanizers4hpc is just an aggregator of the outputs for multiple ranks.
-
-
Was
CRAY_ACC_DEBUG
only for OpenMP applications?- correct, only CCE and OpenACC and OpenMP