The CrayPEToolchain EasyBlock and cpeCray/cpeGNU/cpeAMD modules¶
Introduction¶
Our CrayPEToolchain EasyBlock allows for many different scenarios to generate the cpeCray/cpeGNU/cpeAMD modules:
-
The
cpemodule can be loaded first, last or not at all. Note that if the module is not loaded at all, it may be wise to have a different way of setting the default versions for the Cray PE modules.On LUMI, in the LUMI software stacks, these default versions are already set by the
LUMIstack_<yy.mm>_modulerc.luafiles.In version 21.04 of the CPE, the
cpemodules still have several problems, partly due to an LMOD restriction and partly due to bugs in those modules:-
The
cpemodules setLMOD_MODULERCFILEthroughsetenvrather thanprepend_pathorappend_pathso they overwrite any file that sets system-wide defaults and visibility from other sources, which is not desirable. -
A change to
LMOD_MODULERCFILEhas only effect the next time a module command is executed. This is a restriction not only of LMOD version8.3.xused in the 21.04 CPE, but also of versions in the 8.4 and 8.5 series. Hence loading thecpe/yy.mmmodule before loading other versionless modules for the CPE components will not have the desired effect of loading the versions for that specific version of the CPE. -
The
cpemodules do contain code to reload already loaded modules from the CPE in the correct version, but that code is also broken in the21.04version as the modules may be loaded in an order in which a module that has already been reloaded in the correct version, gets reloaded once more with a versionless load, which may reload the wrong version. This is because the LUA loop withpairsdoesn't have a fixed order of going over the entries in the LUA table. The order should be such that no module reloads an other module that has already been reloaded in the correct version.
-
-
The matching
PrgEnv-*module can either be loaded, or its loading can just be emulated by only setting the environment variable that this module sets, but relying on thecray_targetsvariable and dependencies list to load all Cray PE components.The reason to avoid loading the
PrgEnv-*module is reproducibility. That module depends on a file in/etcto define the components that will be loaded, and that file cannot distinguish between versions of the CPE. Hence if changes to that file would be made, it has an effect on the working of allcpe*modules that EasyBuild may already have generated. -
It is possible to specify target modules via
cray_targets. This is a list just as the dependencies. They will be loaded after thePrgEnv-*module (if the latter is loaded) but before other dependencies specified bydependencies. They do not need to be defined in the EasyBuild external modules file. We chose to load them after thePrgEnv-*module (if the latter is loaded) to be able to overwrite Cray targeting modules loaded by the latter. -
Dependencies in this case will be external modules. It is possible to specify versions by using, e.g.,
Versions should be specified if the( 'gcc/9.3.0', EXTERNAL_MODULE)cpemodule is not loaded, even on LUMI, as if a user would executethe wrong versions of CPE components might be loaded because of the same LMOD restriction that causes the problems with the Craymodule load LUMI/21.04 cpeGNU/21.04cpe/yy.mmmodules: TheLUMI/yy.mmmodule will add a file that sets the default versions of CPE compoments for the requested LUMI software stack and matching CPE version, but those changes only take effect at the followingmodulecommand, so thecpeGNU/21.04module which is loaded in the above example will not yet see the correct default versions of the modules.Note also that if versions are specified but the
cpemodule is loaded at the end, modules might be reloaded in a different version. -
The default value for various parameters is chosen to generate module files that are as similar as possible to those used ast CSCS (or at least those used for their 20.04 environment), but are not the defaults initially used on LUMI.
Supported extra parameters for the EasyConfig files¶
The CrayPEToolchain EasyBlock supports the following parameters:
-
PrgEnv: Sets thePrgEnv-*module to load or emulate.- The default is to derive the value from the name of the module to generate:
PrgEnv = 'cray'forcpeCrayPrgEnv = 'gnu'forcpeGNUPrgEnv = 'aocc'forcpeAMDPrgEnv = 'intel'forcpeIntelPrgEnv = 'nvidia'forcpeNVIDIA(not tested as we have no access to a machine with a fully working version of this environment)
- It is also possible to specify any of these values, or even a different value for a
PrgEnv-*module that is not yet recognized by the EasyBlock.
- The default is to derive the value from the name of the module to generate:
-
PrgEnv_load: Boolean value, indicate if thePrgEnvmodule should be loaded explicitly (if True) or not (if False).Default is
True.If you want to hard-code a version, you can do so by specifying the module with the version in the dependencies.
It is important that all
cpe*modules available in the system at the same time are also generated with the same setting forPrgEnv_loadas otherwise the conflict resolution between those modules would not work correctly. -
PrgEnv_family:-
If
cpeToolchain, the module will declare itself a member of thecpeToolchainfamily. If allcpe*modules are generated that way, this will ensure that no two differentcpe*modules will be loaded simulataneously, which wouldn't work correctly anyway with the Cray compiler wrappers.If
PrgEnv_loadis false, it will also force unload allPrgEnv-*modules to ensure that none is loaded. Otherwise it relies on the family-mechanism used in the LMODPrgEnv-*modules to do the job.This is the most robust option when explicitly loading a
PrgEnv-*module and using LMOD as LMOD will then ensure that no twocpe*modules will be loaded simultaneously and the family mechanism used in the CrayPrgEnv-*modules will do the same for those modules. -
If
PrgEnv, the module will declare itself a member of thePrgEnv-family. This will generate an error ifPrgEnv_loadis True as one cannot load two modules of the same family but is the most robust ootion when using LMOD and emulating thePrgEnv-*module.The LMOD family feature will take care of unloading all other
PrgEnv-*orcpe*modules as they would conflict with the current module. -
If
None(default), which is the only setting that works when TCL-based modules are used and is therefore the default, the module will start with unload commands for all knownPrgEnv-*and allcpe*modules except itself and thePrgEnv-*module that it uses (if it uses one).
It is important that all
cpe*modules available in the system at the same time are also generated with the same setting forPrgEnv_familyas otherwise the conflict resolution between those modules would not work correctly. -
-
CPE_compilerspecifies the (versionless) compiler to load. Possible values are:-
None(default): Derive the name of the compiler module from the name of the module to generate. This may not yet work forcpeNVIDIAas it is not clear what the name of the compiler module will be.If will not add an additional load if that compiler module is already specified in the dependencies.
Note that this will load the module without specifying the version, so it only makes sense to rely on the autodetect feature if the
cpemodule is loaded (and if the bugs with that one are fixed). -
Any other value will be considered the name of the compiler module to load. The module should be versionless. If you want to specify a version, you can do so via
dependencies.No separate load will be generated if the compiler module is also found in the list of dependencies.
-
-
CPE_version: Version of the cpe module to use (if it is used). Possible values:-
None(default): Determine the version from the version of the module to generate, i.e., theversionparameter in the EasyConfig. -
Any other value is interpreted as the value to load.
-
-
CPE_load: Possible values:-
first(default): Load as the very first module. This does not make sense until the LMOD problems withLMOD_MODULERCFILEare fixed. -
after: Load immediately after loadingPrgEnv-*but before loading any other module. This does not make too much sense until the LMOD problems withLMOD_MODULERCFILEare fixed, but it could be a way to first load modules the Cray way and then correct by manually loading correct versions via thecray_targetsanddependenciesparameters.This value will produce an error message when
PrgEnv_loadis set toFalse. -
last: Load as the last module. Currently this does not make sense until the problems with thecpemodule are fully fixed, and on LUMI, until the problem with overwritingLMOD_MODIULERCFILEis fixed. -
None: Do not load thecpemodule but rely on explicit dependencies specified in the list of dependencies instead.
-
-
cray_targets: A list of Cray targetting modules to load. -
dependencies: This is a standard EasyConfig parameter. The versions of the selected PrgEnv, compiler andcraypemodule can be specified through dependencies but those modules will still be loaded according to the scheme below. Any redifinition of thecpemodule is discarded.
Order of loads generated by the EasyBlock¶
-
The
cpe/<CPE_version>module, ifCPE_loadisfirst.If LMOD would be modified to honour changes to
LMOD_MODULERCFILEimmediately as it does with changes toMODULEPATH, this would be the best moment to load thecpemodule as it ensures that all other packages would be loaded with the correct version number immediately. -
The
PrgEnv-<PrgEnv>module, ifPrgEnv_loadis True. -
The targeting modules specified by
cray_targets. Hence they can overwrite the targets set by thePrgEnv-*module which may be usefull on a heterogeneous system should there only be a single configuration for thePrgEnv-*modules for all hardware partitions in the system, or to build acpe*module for cross-compiling.Note that changes to the targeting modules may trigger reloads of other modules loaded by the
PrgEnv-*module. -
The
CPE_compilermodule (or autodected one), unless both PrgEnv-* is loaded explicitly and the module is not in the list of dependencies (in which case we rely on thePrgEnv-*module to do the proper job). -
The craype module (compiler wrappers), unless both PrgEnv-* is loaded explicitly and the module is not in the list of dependencies (in which case we rely on the
PrgEnv-*module to do the proper job). -
The specified dependencies, minus the
cpe/*,PrgEnv-*andcraype/*modules. -
The
cpe/<CPE_version>module, ifCPE_loadislast.In principle this should reload any module loaded before in a version that does not match the selected Cray PE version, and hence will also overwrite versions set in the dependencies. However, in the Cray PE 21.04 release (which was used for testing) the module did not always do the reloads in the proper order to always ensure the right version, and one might even end up with a version that is neither the one specified in the dependencies nor the one specified by the
cpe/*module.
Some examples¶
Non-working: Load cpe and PrgEnv-gnu¶
This is the default configuration for this EasyBlock.
A minimal EasyConfig (omitting some mandatory parts such the homepage and description
parameters) is
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = "21.04"
toolchain = SYSTEM
moduleclass = 'toolchain'
cpe/21.04 and PrgEnv-gnu-modules (in that order). Unfortunately, this scheme
does not work with LMOD 8.3.x as is part of the Cray PE stack when the 21.04-21.06 releases
were made, nor with version from the 8.4 and 8.5 branches, as LMOD_MODULERCFILE is only
honoured at the next module call. If the effect of LMOD_MODULERCFILE would
be immediate, this would probably be the most efficient way of activating a particular
release of a particular PrgEnv. The module does not belong to any family. Instead it
explicitly unloads other cpe* modules.
Non-working: Load PrgEnv-gnu and then cpe¶
Now we first load a PrgEnv-* module and only subsequently the cpe/yy.mm module
that fixes versions for the modules.
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = "21.04"
toolchain = SYSTEM
PrgEnv_family = 'cpeToolchain'
CPE_load = 'after'
moduleclass = 'toolchain'
PrgEnv-gnu
module and then correcting the versions by loading cpe/21.04. This doesn't work
reliably either due to the current design of the module reloading process in the cpe/21.04
module combined with the delayed impact of changes to LMOD_MODULERCFILE.
The module will belong to the cpeToolchain family. That family will take care of
unloading any other cpe* module that would be loaded (provided the PrgEnv_family
parameter was set the same way in their EasyConfigs), while the PrgEnv-gnu module
will take care of unloading other PrgEnv-* modules through the PrgEnv family.
A setup without PrgEnv- or cpe module¶
On LUMI, due to the problems with LMOD and the cpe modules, we currently use a setup
without PrgEnv-* or cpe module. One of the functions of the cpe module,
setting the default versions of the Cray PE components, is already done by the LUMI
module that loads the software stack. The other is replaced by hard-coding the necessary
versions in the EasyConfig. One of the functions of the PrgEnv-* modules, setting
and environment variable that tells the compiler wrappers which PE is selected, is
taken over by the EasyBlock which sets the variable in the module file that it generates.
The other, loading the correct targets and other PE modules, is taken over by the craype_targets
parameter and the dependency list. This is the most reproducible setup as it only depends
on versioned components (the partition module already ensures that a particular version
of the Cray targeting modules is made available).
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = "21.04"
toolchain = SYSTEM
PrgEnv_load = False
PrgEnv_family = 'PrgEnv'
CPE_load = None
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
cpeGNU module generated by this EasyConfig will be unloaded if the user would
load a PrgEnv-* module as it is also a member of the PrgEnv family. As such
it is a full replacement of the Cray PrgEnv-gnu module.
Loading the cpe and PrgEnv modules first, then reloading packages just to be sure¶
A compromise solution that will work around the problems with LMOD and the cpe
modules yet retain much of the spirit of the Cray PE, and that also can correct the
targeting modules should the PrgEnv-* module not take the ones that you want
(or ensure that at least certain other modules are loaded, even if they would be
removed from the list of modules loaded by PrgEnv-gnu in an update of the system), is
the following setup:
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
CPE_load = 'first'
PrgEnv_load = True
PrgEnv_family = 'cpeToolchain'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('PrgEnv-gnu/8.0.0', EXTERNAL_MODULE),
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
cpe/21.04 and PrgEnv-gnu/8.0.0 modules to stay in
the Cray PE spirit. Next the indicated targeting modules will be loaded, one for the
CPU, one for the accelerator architecture and one for the network. This may trigger
reloads of some other modules and will overwrite targeting modules of the same type
loaded by PrgEnv-gnu. Finally, the gcc compiler module, the craype module and all
other modules from the dependency list are loaded with the versions specified.
This setup is a compromise that on one hand stays close to the Cray PE spirit by using
the cpe and PrgEnv-gnu modules, yet works around some problems, namely:
* Setting LMOD_MODULERCFILE does not work immediately.
* Any corrective action when loading cpe after PrgEnv-gnu does not work
* On a heterogeneous cluster, the targeting modules loaded by PrgEnv-gnu may
not be the ones you want when cross-compiling or when the system would use the same
file defining the modules for the whole system.
* The list of modules loaded by PrgEnv-gnu may change as it is determined by
a single file on the system that does not depend on the version of the Cray PE.
In this case, you can always be sure that at least the modules mentioned in the
dependency list and cray_targets parameter will be loaded.
A variant of this would set CPE_load = 'after' which would load the cpe/21.04
module immediately after loading PrgEnv-gnu rather than just before, but with the
current flaws of the cpe/21.04 module this still does not solve all problems:
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
CPE_load = 'after'
PrgEnv_load = True
PrgEnv_family = 'cpeToolchain'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('PrgEnv-gnu/8.0.0', EXTERNAL_MODULE),
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
Mimic PrgEnv, load hardcoded versions but load cpe/yy.mm first¶
This is yet another compromise scenario:
* Loading cpe/yy.mm first ensures that further modules a user might load after
loading the cpe* module will load in the proper versions if a user does a versionless
load.
* Mimicing PrgEnv-* and loading modules explicitly ensures reproducibility over
time as the list of modules loaded does not depend on a single file elsewhere in
the system configuration which is not specific to a particular release of the PE.
* Hard-coding the versions ensures that we avoid the problems caused by the implementation
of the cpe/yy.mm modules (certainly in releases up to and including 21.06)
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
PrgEnv_load = False
PrgEnv_family = 'PrgEnv'
CPE_load = 'first'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('PrgEnv-gnu/8.0.0', EXTERNAL_MODULE),
('gcc/9.3.0', EXTERNAL_MODULE),
('craype/2.7.6', EXTERNAL_MODULE),
('cray-mpich/8.1.4', EXTERNAL_MODULE),
('cray-libsci/21.04.1.1', EXTERNAL_MODULE),
('cray-dsmml/0.1.4', EXTERNAL_MODULE),
('perftools-base/21.02.0', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'
Mimic PrgEnv and load cpe/yy.mm at the end¶
This would be a valid scenario once the cpe/yy.mm modules have been corrected and
work as they should. In this scenario,
-
We mimic
PrgEnv-*by setting the necessary environment variables and then loading a list of versionless modules. This avoids a problem with the actual PrgEnv modules as the list of modules they load depends on a single system file which is the same for all releases of the PE and hence may change over time. -
At the end the relevant
cpe/yy.mmmodule is loaded to fix the versions of all already loaded modules.
The corresponding EasyConfig file (minus help etc.) is:
easyblock = 'CrayPEToolchain'
name = 'cpeGNU'
version = '21.04'
toolchain = SYSTEM
PrgEnv_load = False
PrgEnv_family = 'PrgEnv'
CPE_load = 'last'
cray_targets = [
'craype-x86-rome',
'craype-accel-host',
'craype-network-ofi'
]
dependencies = [
('gcc', EXTERNAL_MODULE),
('craype', EXTERNAL_MODULE),
('cray-mpich', EXTERNAL_MODULE),
('cray-libsci', EXTERNAL_MODULE),
('cray-dsmml', EXTERNAL_MODULE),
('perftools-base', EXTERNAL_MODULE),
('xpmem', EXTERNAL_MODULE),
]
moduleclass = 'toolchain'