Skip to content

Moving your AI training jobs to LUMI workshop - CSC, Espoo, February 4-5 2025

Course organisation

Setting up for the exercises

During the course

If you have an active project on LUMI, you should be able to make the exercises in that project. To reduce the waiting time during the workshop, use the SLURM reservations we provide (see above).

You can find all exercises on our AI workshop GitHub page

After the termination of the course project

More information will follow after the course.

Course materials

Note: Some links in the table below will remain invalid until after the course when all materials are uploaded.

Presentation Slides recording
Welcome and course introduction / video
Introduction to LUMI slides video
Using the LUMI web-interface slides video
Hands-on: Run a simple PyTorch example notebook / video
Your first AI training job on LUMI slides video
Hands-on: Run a simple single-GPU PyTorch AI training job / video
Understanding GPU activity & checking jobs slides video
Hands-on: Checking GPU usage interactively using rocm-smi / video
Running containers on LUMI slides video
Hands-on: Pull and run a container / video
Building containers from Conda/pip environments slides video
Hands-on: Creating a conda environment file and building a container using cotainr / video
Extending containers with virtual environments for faster testing slides video
Scaling AI training to multiple GPUs slides video
Hands-on: Converting the PyTorch single GPU AI training job to use all GPUs in a single node via DDP / video
Extreme scale AI slides video
Demo/Hands-on: Using multiple nodes / video
Loading training data on LUMI slides video
Coupling machine learning with HPC simulation slides video
Hands-on: Advancing your project and general Q&A / /