Extreme-scale AI¶
Presenters: Samuel Antão and Paul Bauer (AMD)
Content:
- Model parallelism on LUMI via FSDP or DeepSpeed
- Scaling beyond a single node
Extra materials¶
Links from the slides¶
-
Advice on RCCL tuning which may be important for stability at very large scale (slide 11, with more information on slide 12)
Q&A¶
/