Skip to content

Your first training job on LUMI

Presenters: Mats Sjöberg (CSC) and Marlon Tobaben (CSC)

Content:

  • Using LUMI via the command line
  • Submitting and running AI training jobs using the batch system

Extra materials

Q&A

  1. Is the --mem-per-gpu parameter required for the SLURM batch script? If you do not specify this, do you get all available memory for one gpu?

    • What you get depends on the partition. You should always specify how much RAM you need just to be safe. On standard-g you get all the memory, but on small-g you do not and may get much less than you expect. It is a fixed quantity independent of the amount of GPUs you ask. Another reason for a proper memory request is that it can protect you from getting a node where memory is taken by, e.g., a memory leak. We had that in the early years of LUMI.
  2. How to add log files parameters to the bash file ?

    • You can redirect output to a file by adding in your slurm script:

      • To redirect stdout: #SBATCH -o <name of output file>

      • To redirect stderr: #SBATCH -e <name of error output file>