Hands-on: Hyper-parameter tuning the PyTorch model using Ray¶
Exercises on the course GitHub.
Q&A¶
-
In
run.sh
in the reference solution, memory is specified to be 0 (#SBATCH --mem=0
). Why does this work? Why do we not need to specify e.g. 480G?-
(Gregor)
--mem=0
will use all the available memory on the node, it's just a faster way of specifying "use all the available memory on the node".--mem=480G
works as well. -
(Kurt) but it is actually better to specify that you want 480G as there is a subtle difference: Asking for 480G guarantees that you will get 480G. You can get a lot less with
--mem=0
if, e.g., due to a memory leak in the system software - which has happened - more memory is consumed by the OS. So asking--mem=480G
guarantees you a node that is healthy at least in that respect.
-