Skip to content

[package list]

OpenMPI

License information

Open MPI is distributed under a 3-clause BSD license that can be found on the "Open MPI License" page on the Open MPI web site.

From version 4.1.6 on, the license can also be found as the file LICENSE in $EBROOTOPENMPI/share/licenses/OpenMPI after loading the OpenMPI module.

User documentation

Open MPI is currently provided as-is, without any support promise. It is not entirely free of problems as Open MPI 4 was not really designed for the interconnect of LUMI while Open MPI 5 requires software that is not yet on the system for security reasons. Our only fully supported MPI implementaton on LUMI is HPE Cray MPICH as provided by the cray-mpich module. We also do not intend to rebuild other software with Open MPI beyond what is needed for some testing.

Open MPI has UCX as the preferred communication library, with OFI (libfabric) often treated as a second choice, certainly when it comes to GPU support. On LUMI the use of the HPE Cray provided libfabric library is rather important, especially to link to the libfabric provider for the Slingshot 11 network of LUMI. Currently that libfabric+provider implementation is not entirely feature-complete so not all routines that are not needed for Cray MPICH are provided. Hence we cannot exclude that there will be compatibility problems.

Two builds from 4.1.6 onwards

Building Open MPI in the cpe* toolchains may seem to make no sense as the MPI module loaded by those toolchains cannot be used, and neither can the MPI routines in LibSci. However, as it is a lot of work to set up specific toolchains without Cray MPICH (or possibly even LibSci and other PE modules), we've chosen to still use the cpe* toolchains and then unload certain modules.

  • The -noCrayWrappers modules provide MPI compiler wrappers that directly call the underlying compiler.

  • The regular modules provide MPI compiler wrappers that in turn call the HPE Cray compiler wrappers so, e.g., adding target architecture options should still work via the target modules.

Note that as a user of these modules you will have to unload some modules manually after loading the OpenMPI module. Failing to do so may result in wrong builds of software. However, due to LMOD limitations it was not possible to automate this.

Known issues

  1. When starting a job on a single node, one gets a warning starting with

    Open MPI failed an OFI Libfabric library call (fi_domain).  This is highly
    unusual; your job may behave unpredictably (and/or abort) after this.
    

    So far it appears that this warning can be safely ignored. It is not entirely clear what the cause is. It may be related to the Slingshot network plugin of Slurm not initialising a virtual network interface as Cray MPICH doesn't use libfabric for intra-node communication.

It is not clear if or when these issues can be solved.

User-installable modules (and EasyConfigs)

Install with the EasyBuild-user module:

eb <easyconfig> -r
To access module help after installation and get reminded for which stacks and partitions the module is installed, use module spider OpenMPI/<version>.

EasyConfig:

Technical documentation

Note that we cannot fully support Open MPI on LUMI. Users should get decent performance on LUMI-C, but despite including the rocm modules, this is not a GPU-aware version of MPI as currently UCX is required for that but not supported on the SlingShot 11 interconnect of LUMI.

The primary MPI implementation on LUMI and the only one which we can fully support remains the implementation provided by the cray-mpich modules on top of libfabric as the communication library. The Cray MPICH implementation also contains some optimisations that are not available in the upstream MPICH installation but are essential for scalability in certain large runs on LUMI.

EasyBuild

Version 4.1.2 for cpeGNU 22.06

  • A development specifically for LUMI.

    The main goal was to have mpirun available, but it can be used with containers with some care to ensure that the container indeed uses the libraries from this module, and turns out to give good enough performance on LUMI for most purposes. It is not GPU-aware though so there is no direct GPU-to-GPU communication.

Version 4.1.3 for cpeGNU 22.08

  • A simple port of the 4.1.2 recipe, with the same restrictions, but one addition to automatically detect the libfabric version so that it can still work after a system update that installs a new libfabric.

  • Later on the ROCm dependency which should not even have been there was removed.

Version 4.1.5 for cpeGNU 22.12

  • A straightforward port of the 4.1.3 EasyConfig.

Version 4.1.6 for 23.09

  • The EasyConfig is heavily reworked, now also building a module that provides more help information.

    Two approaches were chosen:

    • The -noCrayWrappers modules provide MPI compiler wrappers that directly call the underlying compiler.

    • The regular modules provide MPI compiler wrappers that in turn call the HPE Cray compiler wrappers so, e.g., adding target architecture options should still work via the target modules.

    It doesn't really make sense to use the cpe* toolchains, as cray-mpich and the MPI support in cray-libsci cannot be used. In fact, it is even better to unload cray-mpich (and required for the HPE Cray compile wrapper version) to avoid that the wrong libraries are picked up.

  • Also added the license information to the installation.

  • We tried to extend the module to unload the Cray PE components that are irrelevant or even damaging, but that didn't work as doing a load and unload of the same module in the same call to the module command does not work. This happens if the cpe* module is not yet loaded when loading the OpenMPI module.