The HPE Cray Programming Environment¶
On LUMI, the main programming environment is the HPE Cray Programming Environment (further abbreviated as Cray PE). The environment provides several tools, including compilers, communication libraries, optimised math libraries and various other libraries, analyzers and debuggers.
The Cray PE is made available through environment modules that allow to select particular versions of tools and to configure the environment in a flexible way.
HPE Cray PE components¶
Cray compiler environments¶
Cray PE supports multiple compilers, Cray and third party compilers as well: AOCC, Cray, Intel, GNU.
Users can access the compilers loading a programming environment module (see below).
Compilers in the Cray Programming Environment are almost always used through the Cray compiler wrappers. In fact, until recently, some compilers could not be used without the wrappers as they failed to even detect their own include files and some components. The behaviour of the wrapper will depend on the other modules that are loaded in the environment. E.g., there is no separate wrapper for MPI as the MPI header files and libraries are automatically activated through the regular wrappers as soon as a Cray MPI module is loaded.
The commands to invoke compiler wrappers are
CC (C++). They wrap automatically
to the right compilers based on the programming environment module that is loaded and the compiler module.
The online help can be accessed with the
-help option. E.g.:
One of the most frequently used options of the compiler wrappers is
ftn -help | grep verbose -A 1
-craype-verbose Print the command which is forwarded
to compiler invocation
info cc or
man cc will display the man page of the C compiler wrapper.
The compiler wrappers call the correct compiler in the currently loaded programming environment, with appropriate options to build and link applications with relevant libraries, as required by the modules loaded. Besides a number of generic options, they also will pass all other options to the underlying compile, so you can still use the regular compiler options also.
The compiler wrappers should replace direct calls to compiler drivers in Makefiles and build scripts to ensure that the proper libraries are used.
For system libraries, only dynamic linking is supported by compiler wrappers on the Cray EX system
Cray Compiling Environment (CCE)¶
The Cray Compiling Environment is set by the module
PrgEnv-cray, which is loaded by default at login.
The compiler itself is contained in the
CCE consists of Cray compilers performing code analysis during compilation to generate highly optimized code. Supported languages include Fortran, C and C++, and UPC (Unified Parallel C). The Cray C/C++ compiler is basically the Clang/LLVM compiler with a back-end configured by HPE Cray that also adds some Cray-specific options while the Cray Fortran compiler uses an HPE-Cray specific front-end with a back-end based on LLVM. The Fortran compiler supports most of Fortran 2018 and tends to be considerably stricter than the GNU or Intel Fortran compilers.
The CCE compilers will also support OpenMP offload to NVIDIA and AMD GPUs but that is still very immature and very much work in progress at the time of the development of this tutorial. The Fortran compiler (but not the C/C++ compiler) also supports OpenACC for offloading to GPU and is being updated to the newest versions of this standard.
The classic pre-LLVM Cray C/C++ compiler is not available on Cray EX systems.
Compiler-specific manpages can be accessed on the system with
man craycc or
More details are given in the Cray Fortran Reference Manual and the Cray Compiling Environment Release that unfortunately are hidden deep in the new support pages of HPE. The Clang Compiler User’s Manual is another source of information for the Cray C and C++ Clang compilers.
For more information about compiler pragmas and directives, see
man intro_directives on the system.
The GNU C/C++ and Fortran compilers are probably the best supported third party compiler in the Cray PE.
Compiler-specific manpages can be accessed on the system with
man gcc or
More details are provided by the GCC online documentation.
There AOCC compilers, the AMD Optimizing C/C++ Compiler and matching fortran compilers, AMD's compiler offering for CPU-only systems, have a matching programming environment module and are a full citizen of the Cray PE.
Compiler-specific documentation is available in the AOCC User Guide.
Cray provides a bundled package of support libraries to install into the PE environment to enable AOCC, and Cray PE utilities such as debuggers and performance tools work with AOCC.
AMD ROCm compilers¶
The AMD ROCm compilers (in the AMD world sometimes known as AOMP) are supported on systems with AMD CPUs. However, at the time of writing of this tutorial they are not yet available in the LUMI environment and the integration with the Cray environment still seems to be work in progress.
Intel (not on LUMI)¶
The Cray PE also provides a programming environment module to enable the Intel® oneAPI compiler and tools.
The documentation is available in the Intel® oneAPI Programming Guide
Cray provides a bundled package of support libraries to install into the Cray PE to enable the Intel compiler, allowing utilities such as debuggers and performance tools to work with it.
NVIDIA HPC toolkit (not on LUMI)¶
The NVIDIA HPC Toolkit compilers (formerly PGI) are supported on systems with NVIDIA GPUs.
Cray Scientific and Math Library¶
The Cray Scientific and Math Libraries (CSML, also known as LibSci) are a collection of numerical routines optimized for best performance on Cray systems.
These libraries satisfy dependencies for many commonly used applications on Cray systems for a wide variety of domains.
When the module for a CSML package (such as
cray-fftw) is loaded,
all relevant headers and libraries for these packages are added to the compile
and link lines of the
CC compiler wrappers, so linking with them is
completely transparent (to the extent that users wonder where the libraries are).
The CSML collection contains the following Scientific Libraries:
- BLAS (Basic Linear Algebra Subroutines)
- BLACS (Basic Linear Algebra Communication Subprograms)
- CBLAS (Collection of wrappers providing a C interface to the Fortran BLAS library)
- IRT (Iterative Refinement Toolkit)
- LAPACK (Linear Algebra Routines)
- LAPACKE (C interfaces to LAPACK Routines)
- ScaLAPACK (Scalable LAPACK)
libsci_acc(library of Cray-optimized BLAS, LAPACK, and ScaLAPACK routines)
- NetCDF (Network Common Data Format)
- FFTW3 (the Fastest Fourier Transforms in the West, release 3)
Cray Message Passing Toolkit¶
MPI is a widely used parallel programming model that establishes a practical, portable, efficient, and flexible standard for passing messages between ranks in parallel processes.
Cray MPI is derived from Argonne National Laboratory MPICH and implements the MPI-3.1 standard as documented by the MPI Forum in MPI: A Message Passing Interface Standard, Version 3.1.
Support for MPI varies depending on system hardware. To see which functions and environment variables the
system supports, please have a look at the corresponding man pages with
man intro_mpi on the system.
Note that though on LUMI at the time of the tutorial both the OpenFabric Interface (OFI) based and UCX-based versions of the library are supported, the finalised LUMI system will only support OFI.
- Website: https://pe-cray.github.io/cray-dsmml
Distributed Symmetric Memory Management Library (DSMML) is a HPE Cray proprietary memory management library.
DSMML is a standalone memory management library for maintaining distributed shared symmetric memory heaps for top-level PGAS languages and libraries like Coarray Fortran, UPC, and OpenSHMEM.
DSMML allows user libraries to create multiple symmetric heaps and share information with other libraries.
Through DSMML, interoperability can be extracted between PGAS programming models.
Further details are available in the man page on the system with
Cray Performance and Analysis Tools:
Tools to analyze the performance and behavior of programs that are run on Cray systems, and a Performance API (PAPI).
Cray Debugging Support Tools:
Debugging tools, including
Configuring the Cray PE through modules¶
Multiple releases of the HPE Cray PE can be installed simultaneously on the system and users can mix-and-match
components. However, it is also possible to load only components for a specific release of the PE.
Cray PE versions have version numbers of the form
22.02 for the version released in
February of 2022. However, each of the components have their own version number and it is not easy to see
which version of a component came with which version(s) of the Cray PE.
Below we only discuss those modules that are important when building software with EasyBuild. Debuggers, profilers, etc., are not included in the list.
The programming environment modules¶
The usual way to initialise the Cray PE is by loading one of the
PrgEnv-* modules. This module will then
load the appropriate compiler, compiler wrappers and other libraries. Some of the components that will be
loaded are configured through the
/etc/cray-pe.d/cray-pe-configuration.sh file, so the list of modules
may depend on the actual system that you are using.
The Cray PE supports the following
PrgEnv-* modules. On LUMI, only the first three are currently available:
|The Cray Compiling Environment compilers
|The GNU compilers
|AMD compilers for CPU-only systems
|AMD ROCm compilers for GPU systems
|The Intel compilers
|NVIDIA HPC toolkit compilers (formerly PGI)
PrgEnv-* modules belong to the same module family. Hence Lmod will automatically unload any
PrgEnv-* module when you load a different one.
Selecting the version through the cpe meta-module¶
The Cray PE on the EX system provides the meta-module
cpe: the purpose of the meta-module is
similar to the scope of the
cdt-cuda meta-modules available on the XC systems.
Loading one of the
cpe/yy.mm modules (e.g.,
cpe/22.02) has the following effects:
It sets the default versions of each of the Cray PE modules to the version that comes with that particular release of the Cray PE. E.g.,
module load cpe/22.02 module load cce
would load that version of the
ccecompiler that comes with the 22.02 release of the Cray PE.
It will reload all already loaded Cray PE modules and switch them over to the version corresponding to that particular release of the Cray PE.
Limitations and bugs
Due to the way Lmod works and implementation bugs in the
cpe modules, loading the
does not always have the desired effect.
The Cray PE sets the default version of each module by adding a file to the list of files in the
LMOD_MODULERCFILEenvironment variable. This is because Lmod does not re-evaluate the visibility of modules and the internal list of default version immediately when the value of
LMOD_MODULERCFILEis changed but only the next time the
modulecommand is executed. Henceand
module load cpe/22.02 ; module load ccedo not have the same effect. In the first version, the version of
module load cpe/22.02 cce
cceloaded is the version that corresponds to the 22.02 release of the Cray PE. In the second case however the default version of the
ccemodule is determined by whatever list of default modules was used when calling the
modulecommand so may or may not be the one of the 22.02 release.
cpemodule after loading the other Cray PE modules also does not always have the desired effect in many versions of the Cray PE. This is because of a bug in the
cpemodule that reloads the modules in the wrong order which may trigger the reload of a module with whatever version was the default when the
modulecommand was called rather than the version the the
cpemodule intends to (re-)load.
The compiler wrapper module craype¶
craype module loads the compiler wrappers. There is only one set of compiler wrappers for all compilers.
Which compiler will be called, which libraries will be included, but also processor and GPU target options will
be used, is all determined by other modules. Hence it is in principle possible to use a single Makefile for
a project and still reconfigure the build by loading certain modules.
The targets for CPU and GPU optimization, the network library for MPI bt also some other compiler options, can be set through target modules:
craype-x86-*(and similar options can be expected on ARM-based systems) set the target for CPU optimisations. For LUMI, the
craype-x86-trentomodules are relevant.
This can also be used to cross-compile to a different target architecture unless the compiler target gets overwritten by a compiler flag added to the command line through the Makefile or other means, something that unfortunately happens more and more often in faulty software installation procedures.
creype-accel-*sets the target for OpenMP offload (and likely other technologies in the future). E.g., loading
craype-accel-amd-gfx90atells the compilers to target AMD MI200 family GPUs, while loading
craype-accel-hosttells the compiler to use the CPU instead (according to the documentation, the latter is for
craype-network-*selects the communication library to be used by Cray MPICH. On Slingshot 11 EX systems, only
craype-network-ofiis supported, but Slingshot 10 EX systems also offer support for UCX through the
craype-hugepages*modules enable Cray Huge Pages support. To fully enable this support they have to be used at link-time and at run-time. At link time, support is compiled into the binary while at run-time they are used to set the actual size of the huge pages.
craype-hugepages*modules are not supported by all compilers. E.g., the AOCC compiler does not support huge pages at the moment, and loading the module at link time will cause an error message from the linker.
The compiler modules¶
The compiler modules have already been discussed with the
PrgEnv-* modules above. The different regular
compiler modules also all belong to the same family so no two different compilers can be loaded simultaneously
and Lmod will automatically unload the other compiler when a new one is loaded.
The MPI modules¶
To load the Cray MPI libraries, both one of the
craype-network-* modules and a compiler module has to be
loaded as the MPI libraries are both network- and compiler specific.
For some unknown reason, the MPI module for the libfabric (
craype-network-ofi) transport is called
while the library for the UCX transport (
craype-network-ucx) is called
craype-network-ucx. As a result,
the MPI module fails to reload automatically when switching between both transports, but it does reload automatically
when switching compilers.
Loading an MPI module will also automatically configure the regular compiler wrappers to compile with support for that MPI module. However, the libfabric and UCX versions of the MPI library have compatible interfaces, so it is always possible to swap between those versions at runtime.
The Cray Scientific libraries¶
The Cray Scientific Libraries are loaded through the
cray-libsci module (or
cray-libsci_acc for the GPU
versions). Loading this module makes the BLAS, LAPACK, and ScaLAPACK libraries available, and also the Cray IRT
(Iterative Refinement Toolkit), a Cray-specific library. It will also configure the compiler wrappers to link
with these libraries, so no additional include or link options are needed.
As FFTW is a third party library, it gets its own module (
cray-fftw) and is not included in the
cray-fftw module can only be loaded if one of the processor target modules (the
is loaded first.
Some unexpected behaviour of the Cray PE¶
On Cray EX systems, dynamic linking is the preferred way of linking applications. With that comes some
unexpected behaviour of the Cray modules. EasyBuild users expect that at run time the versions of the
libraries that are used, are the ones from the modules that are loaded. This is not always the case
for the runtime libraries of the Cray PE. By default the Cray PE will use a default set of libraries
that is determined by the system default version of the Cray PE (which is set by the sysadmins, not
determined by the
cpe module). This can only be avoided by either using\
rpath-linking (which is also special in the Cray PE as the wrappers activate rpath linking if the
CRAY_ADD_RPATH is defined and set to
yes) or by manually adjusting
the library path after loading the modules:
The net result of this feature is that some reproducibility of the results is lost. Programs will
react differently if the system default version of the Cray PE is changed as that will change the
set of default libraries that will be used at runtime unless rpath-linking is used or users
- LUMI documentation: "Developing" section
- The Cray PE is mostly documented through man pages. There used to be some documentation on the Cray web site also but the documentation system got reworked after the merger with HPE. The documentation is now in the HPE Support Centre where it is very difficult to find the right version of the documents.
- The PE-Cray GitHub project also provides some additional documentation, including