The CrayPEToolchain EasyBlock and cpeCray/cpeGNU/cpeAMD modules¶
Introduction¶
Our CrayPEToolchain EasyBlock allows for many different scenarios to generate the cpeCray/cpeGNU/cpeAMD modules:
-
The
cpe
module can be loaded first, last or not at all. Note that if the module is not loaded at all, it may be wise to have a different way of setting the default versions for the Cray PE modules.On LUMI, in the LUMI software stacks, these default versions are already set by the
LUMIstack_<yy.mm>_modulerc.lua
files.In version 21.04 of the CPE, the
cpe
modules still have several problems, partly due to an LMOD restriction and partly due to bugs in those modules:-
The
cpe
modules setLMOD_MODULERCFILE
throughsetenv
rather thanprepend_path
orappend_path
so they overwrite any file that sets system-wide defaults and visibility from other sources, which is not desirable. -
A change to
LMOD_MODULERCFILE
has only effect the next time a module command is executed. This is a restriction not only of LMOD version8.3.x
used in the 21.04 CPE, but also of versions in the 8.4 and 8.5 series. Hence loading thecpe/yy.mm
module before loading other versionless modules for the CPE components will not have the desired effect of loading the versions for that specific version of the CPE. -
The
cpe
modules do contain code to reload already loaded modules from the CPE in the correct version, but that code is also broken in the21.04
version as the modules may be loaded in an order in which a module that has already been reloaded in the correct version, gets reloaded once more with a versionless load, which may reload the wrong version. This is because the LUA loop withpairs
doesn't have a fixed order of going over the entries in the LUA table. The order should be such that no module reloads an other module that has already been reloaded in the correct version.
-
-
The matching
PrgEnv-*
module can either be loaded, or its loading can just be emulated by only setting the environment variable that this module sets, but relying on thecray_targets
variable and dependencies list to load all Cray PE components.The reason to avoid loading the
PrgEnv-*
module is reproducibility. That module depends on a file in/etc
to define the components that will be loaded, and that file cannot distinguish between versions of the CPE. Hence if changes to that file would be made, it has an effect on the working of allcpe*
modules that EasyBuild may already have generated. -
It is possible to specify target modules via
cray_targets
. This is a list just as the dependencies. They will be loaded after thePrgEnv-*
module (if the latter is loaded) but before other dependencies specified bydependencies
. They do not need to be defined in the EasyBuild external modules file. We chose to load them after thePrgEnv-*
module (if the latter is loaded) to be able to overwrite Cray targeting modules loaded by the latter. -
Dependencies in this case will be external modules. It is possible to specify versions by using, e.g.,
Versions should be specified if the( 'gcc/9.3.0', EXTERNAL_MODULE)
cpe
module is not loaded, even on LUMI, as if a user would executethe wrong versions of CPE components might be loaded because of the same LMOD restriction that causes the problems with the Craymodule load LUMI/21.04 cpeGNU/21.04
cpe/yy.mm
modules: TheLUMI/yy.mm
module will add a file that sets the default versions of CPE compoments for the requested LUMI software stack and matching CPE version, but those changes only take effect at the followingmodule
command, so thecpeGNU/21.04
module which is loaded in the above example will not yet see the correct default versions of the modules.Note also that if versions are specified but the
cpe
module is loaded at the end, modules might be reloaded in a different version. -
The default value for various parameters is chosen to generate module files that are as similar as possible to those used ast CSCS (or at least those used for their 20.04 environment), but are not the defaults initially used on LUMI.
Supported extra parameters for the EasyConfig files¶
The CrayPEToolchain
EasyBlock supports the following parameters:
-
PrgEnv
: Sets thePrgEnv-*
module to load or emulate.- The default is to derive the value from the name of the module to generate:
PrgEnv = 'cray'
forcpeCray
PrgEnv = 'gnu'
forcpeGNU
PrgEnv = 'aocc'
forcpeAMD
PrgEnv = 'intel'
forcpeIntel
PrgEnv = 'nvidia'
forcpeNVIDIA
(not tested as we have no access to a machine with a fully working version of this environment)
- It is also possible to specify any of these values, or even a different value for a
PrgEnv-*
module that is not yet recognized by the EasyBlock.
- The default is to derive the value from the name of the module to generate:
-
PrgEnv_load
: Boolean value, indicate if thePrgEnv
module should be loaded explicitly (if True) or not (if False).Default is
True
.If you want to hard-code a version, you can do so by specifying the module with the version in the dependencies.
It is important that all
cpe*
modules available in the system at the same time are also generated with the same setting forPrgEnv_load
as otherwise the conflict resolution between those modules would not work correctly. -
PrgEnv_family
:-
If
cpeToolchain
, the module will declare itself a member of thecpeToolchain
family. If allcpe*
modules are generated that way, this will ensure that no two differentcpe*
modules will be loaded simulataneously, which wouldn't work correctly anyway with the Cray compiler wrappers.If
PrgEnv_load
is false, it will also force unload allPrgEnv-*
modules to ensure that none is loaded. Otherwise it relies on the family-mechanism used in the LMODPrgEnv-*
modules to do the job.This is the most robust option when explicitly loading a
PrgEnv-*
module and using LMOD as LMOD will then ensure that no twocpe*
modules will be loaded simultaneously and the family mechanism used in the CrayPrgEnv-*
modules will do the same for those modules. -
If
PrgEnv
, the module will declare itself a member of thePrgEnv-
family. This will generate an error ifPrgEnv_load
is True as one cannot load two modules of the same family but is the most robust ootion when using LMOD and emulating thePrgEnv-*
module.The LMOD family feature will take care of unloading all other
PrgEnv-*
orcpe*
modules as they would conflict with the current module. -
If
None
(default), which is the only setting that works when TCL-based modules are used and is therefore the default, the module will start with unload commands for all knownPrgEnv-*
and allcpe*
modules except itself and thePrgEnv-*
module that it uses (if it uses one).
It is important that all
cpe*
modules available in the system at the same time are also generated with the same setting forPrgEnv_family
as otherwise the conflict resolution between those modules would not work correctly. -
-
CPE_compiler
specifies the (versionless) compiler to load. Possible values are:-
None
(default): Derive the name of the compiler module from the name of the module to generate. This may not yet work forcpeNVIDIA
as it is not clear what the name of the compiler module will be.If will not add an additional load if that compiler module is already specified in the dependencies.
Note that this will load the module without specifying the version, so it only makes sense to rely on the autodetect feature if the
cpe
module is loaded (and if the bugs with that one are fixed). -
Any other value will be considered the name of the compiler module to load. The module should be versionless. If you want to specify a version, you can do so via
dependencies
.No separate load will be generated if the compiler module is also found in the list of dependencies.
-
-
CPE_version
: Version of the cpe module to use (if it is used). Possible values:-
None
(default): Determine the version from the version of the module to generate, i.e., theversion
parameter in the EasyConfig. -
Any other value is interpreted as the value to load.
-
-
CPE_load
: Possible values:-
first
(default): Load as the very first module. This does not make sense until the LMOD problems withLMOD_MODULERCFILE
are fixed. -
after
: Load immediately after loadingPrgEnv-*
but before loading any other module. This does not make too much sense until the LMOD problems withLMOD_MODULERCFILE
are fixed, but it could be a way to first load modules the Cray way and then correct by manually loading correct versions via thecray_targets
anddependencies
parameters.This value will produce an error message when
PrgEnv_load
is set toFalse
. -
last
: Load as the last module. Currently this does not make sense until the problems with thecpe
module are fully fixed, and on LUMI, until the problem with overwritingLMOD_MODIULERCFILE
is fixed. -
None
: Do not load thecpe
module but rely on explicit dependencies specified in the list of dependencies instead.
-
-
cray_targets
: A list of Cray targetting modules to load. -
dependencies
: This is a standard EasyConfig parameter. The versions of the selected PrgEnv, compiler andcraype
module can be specified through dependencies but those modules will still be loaded according to the scheme below. Any redifinition of thecpe
module is discarded.
Order of loads generated by the EasyBlock¶
-
The
cpe/<CPE_version>
module, ifCPE_load
isfirst
.If LMOD would be modified to honour changes to
LMOD_MODULERCFILE
immediately as it does with changes toMODULEPATH
, this would be the best moment to load thecpe
module as it ensures that all other packages would be loaded with the correct version number immediately. -
The
PrgEnv-<PrgEnv>
module, ifPrgEnv_load
is True. -
The targeting modules specified by
cray_targets
. Hence they can overwrite the targets set by thePrgEnv-*
module which may be usefull on a heterogeneous system should there only be a single configuration for thePrgEnv-*
modules for all hardware partitions in the system, or to build acpe*
module for cross-compiling.Note that changes to the targeting modules may trigger reloads of other modules loaded by the
PrgEnv-*
module. -
The
CPE_compiler
module (or autodected one), unless both PrgEnv-* is loaded explicitly and the module is not in the list of dependencies (in which case we rely on thePrgEnv-*
module to do the proper job). -
The craype module (compiler wrappers), unless both PrgEnv-* is loaded explicitly and the module is not in the list of dependencies (in which case we rely on the
PrgEnv-*
module to do the proper job). -
The specified dependencies, minus the
cpe/*
,PrgEnv-*
andcraype/*
modules. -
The
cpe/<CPE_version>
module, ifCPE_load
islast
.In principle this should reload any module loaded before in a version that does not match the selected Cray PE version, and hence will also overwrite versions set in the dependencies. However, in the Cray PE 21.04 release (which was used for testing) the module did not always do the reloads in the proper order to always ensure the right version, and one might even end up with a version that is neither the one specified in the dependencies nor the one specified by the
cpe/*
module.
Some examples¶
Non-working: Load cpe and PrgEnv-gnu¶
This is the default configuration for this EasyBlock.
A minimal EasyConfig (omitting some mandatory parts such the homepage
and description
parameters) is
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = "21.04"
toolchain = SYSTEM
moduleclass = 'toolchain'
cpe/21.04
and PrgEnv-gnu
-modules (in that order). Unfortunately, this scheme
does not work with LMOD 8.3.x as is part of the Cray PE stack when the 21.04-21.06 releases
were made, nor with version from the 8.4 and 8.5 branches, as LMOD_MODULERCFILE is only
honoured at the next module
call. If the effect of LMOD_MODULERCFILE
would
be immediate, this would probably be the most efficient way of activating a particular
release of a particular PrgEnv. The module does not belong to any family. Instead it
explicitly unloads other cpe*
modules.
Non-working: Load PrgEnv-gnu and then cpe¶
Now we first load a PrgEnv-*
module and only subsequently the cpe/yy.mm
module
that fixes versions for the modules.
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = "21.04"
toolchain = SYSTEM
PrgEnv_family = 'cpeToolchain'
CPE_load = 'after'
moduleclass = 'toolchain'
PrgEnv-gnu
module and then correcting the versions by loading cpe/21.04
. This doesn't work
reliably either due to the current design of the module reloading process in the cpe/21.04
module combined with the delayed impact of changes to LMOD_MODULERCFILE
.
The module will belong to the cpeToolchain
family. That family will take care of
unloading any other cpe*
module that would be loaded (provided the PrgEnv_family
parameter was set the same way in their EasyConfigs), while the PrgEnv-gnu
module
will take care of unloading other PrgEnv-*
modules through the PrgEnv
family.
A setup without PrgEnv- or cpe module¶
On LUMI, due to the problems with LMOD and the cpe modules, we currently use a setup
without PrgEnv-*
or cpe
module. One of the functions of the cpe
module,
setting the default versions of the Cray PE components, is already done by the LUMI
module that loads the software stack. The other is replaced by hard-coding the necessary
versions in the EasyConfig. One of the functions of the PrgEnv-*
modules, setting
and environment variable that tells the compiler wrappers which PE is selected, is
taken over by the EasyBlock which sets the variable in the module file that it generates.
The other, loading the correct targets and other PE modules, is taken over by the craype_targets
parameter and the dependency list. This is the most reproducible setup as it only depends
on versioned components (the partition
module already ensures that a particular version
of the Cray targeting modules is made available).
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = "21.04"
toolchain = SYSTEM
PrgEnv_load = False
PrgEnv_family = 'PrgEnv'
CPE_load = None
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
cpeGNU
module generated by this EasyConfig will be unloaded if the user would
load a PrgEnv-*
module as it is also a member of the PrgEnv
family. As such
it is a full replacement of the Cray PrgEnv-gnu
module.
Loading the cpe and PrgEnv modules first, then reloading packages just to be sure¶
A compromise solution that will work around the problems with LMOD and the cpe
modules yet retain much of the spirit of the Cray PE, and that also can correct the
targeting modules should the PrgEnv-*
module not take the ones that you want
(or ensure that at least certain other modules are loaded, even if they would be
removed from the list of modules loaded by PrgEnv-gnu
in an update of the system), is
the following setup:
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
CPE_load = 'first'
PrgEnv_load = True
PrgEnv_family = 'cpeToolchain'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('PrgEnv-gnu/8.0.0', EXTERNAL_MODULE),
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
cpe/21.04
and PrgEnv-gnu/8.0.0
modules to stay in
the Cray PE spirit. Next the indicated targeting modules will be loaded, one for the
CPU, one for the accelerator architecture and one for the network. This may trigger
reloads of some other modules and will overwrite targeting modules of the same type
loaded by PrgEnv-gnu
. Finally, the gcc compiler module, the craype module and all
other modules from the dependency list are loaded with the versions specified.
This setup is a compromise that on one hand stays close to the Cray PE spirit by using
the cpe
and PrgEnv-gnu
modules, yet works around some problems, namely:
* Setting LMOD_MODULERCFILE
does not work immediately.
* Any corrective action when loading cpe
after PrgEnv-gnu
does not work
* On a heterogeneous cluster, the targeting modules loaded by PrgEnv-gnu
may
not be the ones you want when cross-compiling or when the system would use the same
file defining the modules for the whole system.
* The list of modules loaded by PrgEnv-gnu
may change as it is determined by
a single file on the system that does not depend on the version of the Cray PE.
In this case, you can always be sure that at least the modules mentioned in the
dependency list and cray_targets
parameter will be loaded.
A variant of this would set CPE_load = 'after'
which would load the cpe/21.04
module immediately after loading PrgEnv-gnu
rather than just before, but with the
current flaws of the cpe/21.04
module this still does not solve all problems:
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
CPE_load = 'after'
PrgEnv_load = True
PrgEnv_family = 'cpeToolchain'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('PrgEnv-gnu/8.0.0', EXTERNAL_MODULE),
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
Mimic PrgEnv, load hardcoded versions but load cpe/yy.mm first¶
This is yet another compromise scenario:
* Loading cpe/yy.mm
first ensures that further modules a user might load after
loading the cpe*
module will load in the proper versions if a user does a versionless
load.
* Mimicing PrgEnv-*
and loading modules explicitly ensures reproducibility over
time as the list of modules loaded does not depend on a single file elsewhere in
the system configuration which is not specific to a particular release of the PE.
* Hard-coding the versions ensures that we avoid the problems caused by the implementation
of the cpe/yy.mm
modules (certainly in releases up to and including 21.06)
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
PrgEnv_load = False
PrgEnv_family = 'PrgEnv'
CPE_load = 'first'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('PrgEnv-gnu/8.0.0', EXTERNAL_MODULE),
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
Mimic PrgEnv and load cpe/yy.mm at the end¶
This would be a valid scenario once the cpe/yy.mm
modules have been corrected and
work as they should. In this scenario,
-
We mimic
PrgEnv-*
by setting the necessary environment variables and then loading a list of versionless modules. This avoids a problem with the actual PrgEnv modules as the list of modules they load depends on a single system file which is the same for all releases of the PE and hence may change over time. -
At the end the relevant
cpe/yy.mm
module is loaded to fix the versions of all already loaded modules.
The corresponding EasyConfig file (minus help etc.) is:
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
PrgEnv_load = False
PrgEnv_family = 'PrgEnv'
CPE_load = 'last'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('gcc', EXTERNAL_MODULE),
('craype', EXTERNAL_MODULE),
('cray-mpich', EXTERNAL_MODULE),
('cray-libsci', EXTERNAL_MODULE),
('cray-dsmml', EXTERNAL_MODULE),
('perftools-base', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'