Skip to content

Building containers from Conda/pip environments

Presenter: Julius Roeder (DeiC)

Content:

  • Containers from conda/pip environments
  • Recipes for PyTorch, Tensorflow, and JAX/Flax on LUMI

Extra materials

Remarks to things mentioned in the recording

ROCm compatibility

The compatibility situation is actually even more complicated than explained in this presentation. The kernel driver for the GPUs depends on certain kernel versions. The kernel version depends on the version of the management interface of LUMI. So basically to do the upgrade to a newer ROCm driver, we often need to update nearly everything on LUMI.

Furthermore, we need to ensure that MPI also works. GPU-aware MPI also depends on versions of ROCm and driver. So before updating to a new ROCm version we also need versions of the HPE Programming Environment compatible with those ROCm versions or all users of traditional HPC simulation codes would be very unhappy. That canis also be a factor stopping an update.

Images in /appl/local/containers/sif-images

It is important to realise that the base images in /appl/local/containers/sif-images are symbolic links to the actual containers and they vary over time without warning. That may be a problem if you build on top of them, as all of a sudden things that you install on top of them may be incompatible with new packages in that container. So if you do that (topic of the next presentation) it is better to make a copy of the container and use that one.

EasyBuild actually gets its container images from /appl/local/containers/easybuild-sif-images which contains copies from the container images as they were when the corresponding EasyConfig was created so that an EasyConfig with a given name will always use the same module. This to improve reproducibility. We can also more easily adapt the bindings to each specific container. E.g., for some containers that required $WITH_CONDA, we solved this via injecting some environment variables via the module, which is then again not needed anymore for containers built in 2025. Or some containers require a libfabric library from the system, while others have on built into the container.

It is possible to extend an existing container with a virtual environment (topic of the next presentation) and automate that with EasyBuild, but it is complex enough that it might require help from someone with enough EasyBuild experience. An example is this EasyConfig but this is not something that an unexperienced user should try to create.

What does the tool provided by lumi-container-wrapper do?

The lumi-container-wrapper provides a tool that enables to do some pip and conda installations in a file system friendly way. It also uses a base container but that one does not have a ROCm in it so it is of little use for AI software unless you can use the ROCm from the system. It basically does not change the base container, but installs the software in a separate SquashFS file. Furthermore, for each command it can find in the container, it will create a wrapper script outside the container that will call singularity with the right bindings to run that command in the container. It is actually rather hard to start the container "by hand" using the singularity command as you will also have to create the right bindmount for the SquashFS file containing the actual software installation.

The cotainr tool on the other hand will take the selected base image and build a new container from it that can be used the way containers are normally used.

Q&A

/