Skip to content

[package list]

OpenMPI

License information

Open MPI is distributed under a 3-clause BSD license that can be found on the "Open MPI License" page on the Open MPI web site.

From version 4.1.6 on, the license can also be found as the file LICENSE in $EBROOTOPENMPI/share/licenses/OpenMPI after loading the OpenMPI module.

User documentation

Open MPI is currently provided as-is, without any support promise. It is not entirely free of problems as Open MPI 4 was not really designed for the interconnect of LUMI while Open MPI 5 requires software that is not yet on the system for security reasons for optimal integration with Slurm, and needs a custom libfabric library and CXI provider. Our only fully supported MPI implementaton on LUMI is HPE Cray MPICH as provided by the cray-mpich module. We also do not intend to rebuild other software with Open MPI beyond what is needed for some testing.

Open MPI has UCX as the preferred communication library, with OFI (libfabric) often treated as a second choice, certainly when it comes to GPU support. On LUMI the use of a HPE Cray provided libfabric library is rather important, especially to link to the libfabric provider for the Slingshot 11 network of LUMI. Currently that default libfabric+provider implementation is not entirely feature-complete so not all routines that are not needed for Cray MPICH are provided. Hence we cannot exclude that there will be compatibility problems. For Open MPI 5, a special version is used but that too is a relatively early release with specific features for Open MPI 5.

Two Open MPI 4 builds from 4.1.6 onwards

Building Open MPI in the cpe* toolchains may seem to make no sense as the MPI module loaded by those toolchains cannot be used, and neither can the MPI routines in LibSci. However, as it is a lot of work to set up specific toolchains without Cray MPICH (or possibly even LibSci and other PE modules), we've chosen to still use the cpe* toolchains and then unload certain modules.

  • The -noCrayWrappers modules provide MPI compiler wrappers that directly call the underlying compiler.

  • The regular modules provide MPI compiler wrappers that in turn call the HPE Cray compiler wrappers so, e.g., adding target architecture options should still work via the target modules.

Note that as a user of these modules you will have to unload some modules manually after loading the OpenMPI module. Failing to do so may result in wrong builds of software. However, due to LMOD limitations it was not possible to automate this.

Known issues

  1. When starting a job on a single node with Open MPI 4, one gets a warning starting with

    Open MPI failed an OFI Libfabric library call (fi_domain).  This is highly
    unusual; your job may behave unpredictably (and/or abort) after this.
    

    Slurm on LUMI does not initialise a virtual network interface for a job step that uses only one node, as Cray MPICH will never use it. However, Open MPI relies on libfabric also for intra-node communication and does check the network interface, leading to this message. It can be safely ignored, but you can also get rid of it by using the --network=single_node_vni flag with srun. (The Cray documentation says that there are cases where --network=single_node_vni,job_vni,def_tles=0 is needed but we haven't seen such cases yet.)

It is not clear if or when these issues can be solved.

User-installable modules (and EasyConfigs)

Install with the EasyBuild-user module:

eb <easyconfig> -r
To access module help after installation and get reminded for which stacks and partitions the module is installed, use module spider OpenMPI/<version>.

EasyConfig:

Technical documentation

Note that we cannot fully support Open MPI on LUMI. Users should get decent performance on LUMI-C, but despite including the rocm modules, this is not a GPU-aware version of MPI as currently UCX is required for that but not supported on the SlingShot 11 interconnect of LUMI.

The primary MPI implementation on LUMI and the only one which we can fully support remains the implementation provided by the cray-mpich modules on top of libfabric as the communication library. The Cray MPICH implementation also contains some optimisations that are not available in the upstream MPICH installation but are essential for scalability in certain large runs on LUMI.

EasyBuild

Version 4.1.2 for cpeGNU 22.06

  • A development specifically for LUMI.

    The main goal was to have mpirun available, but it can be used with containers with some care to ensure that the container indeed uses the libraries from this module, and turns out to give good enough performance on LUMI for most purposes. It is not GPU-aware though so there is no direct GPU-to-GPU communication.

Version 4.1.3 for cpeGNU 22.08

  • A simple port of the 4.1.2 recipe, with the same restrictions, but one addition to automatically detect the libfabric version so that it can still work after a system update that installs a new libfabric.

  • Later on the ROCm dependency which should not even have been there was removed.

Version 4.1.5 for cpeGNU 22.12

  • A straightforward port of the 4.1.3 EasyConfig.

Version 4.1.6 for 23.09

  • The EasyConfig is heavily reworked, now also building a module that provides more help information.

    Two approaches were chosen:

    • The -noCrayWrappers modules provide MPI compiler wrappers that directly call the underlying compiler.

    • The regular modules provide MPI compiler wrappers that in turn call the HPE Cray compiler wrappers so, e.g., adding target architecture options should still work via the target modules.

    It doesn't really make sense to use the cpe* toolchains, as cray-mpich and the MPI support in cray-libsci cannot be used. In fact, it is even better to unload cray-mpich (and required for the HPE Cray compile wrapper version) to avoid that the wrong libraries are picked up.

  • Also added the license information to the installation.

  • We tried to extend the module to unload the Cray PE components that are irrelevant or even damaging, but that didn't work as doing a load and unload of the same module in the same call to the module command does not work. This happens if the cpe* module is not yet loaded when loading the OpenMPI module.

Experimental GPU support with 5.0.8 version for 24.03

  • There is EasyConfig with experimental support for ROCm (AMD GPUs) that uses custom dependencies:

    • Slinghost Host Software (SHS) elements with updated LibCXI

    • Recent OFI Libfabric release (v2.3.0) with the ROCm support

    • ROCm 6.2.2

  • It also requires careful build and runtime configuration:

    • There is no direct SLURM integration via the pmix mpi plugin and programs need to be launched via mpirun

    • SLURM affinity bindings are not necessarily honored and one needs to apply mpirun specific bind options

    • There is an experimental OFI provider used, LinkX, to allow intra-node shared memory communication along with the cxi network (Slingshot) for intra-node transfers

    • To run with the CXI provider only (network only) one needs to overwrite these environment variables:

      • OMPI_MCA_opal_common_ofi_provider_include=cxi for mpirun

      • moreover to allow multiple processes per node, one needs to enable network VNIs using SLURM's job allocation option: --network=single_node_vni

    • To run interactively from an existing SLURM jobstep one needs to allow the OpenMPI daemon to overlap resources of actual allocation with environmental variable:

      • PRTE_MCA_plm_slurm_args="--overlap --exact"

Archived EasyConfigs

The EasyConfigs below are additional easyconfigs that are not directly available on the system for installation. Users are advised to use the newer ones and these archived ones are unsupported. They are still provided as a source of information should you need this, e.g., to understand the configuration that was used for earlier work on the system.