Some failed experiments¶
These failed experiments are documented to avoid making the same mistake twice.
Overwriting a module by putting one with the same name and version earlier in the path may not work¶
During the development of the LUMI software stack prototype using the 21.04 CPE release
on eiger it turned out that the cpe/21.04
module had one nasty habbit: By setting
LMOD_MODULERCFILE
with the setenv
LMOD function rather than by prepending or
appending to it using prependpath
/append_path
, it cleared the modulerc.lua
file that the LUMI software setup relies upon.
We tried to cure that by putting a modified copy of the file in a directory earlier in the MODULEPATH and hiding the original Cray module with a
hide_modulefile ( '/opt/cray/pe/lmod/modulefiles/core/cpe/21.04.lua' )
modulerc.lua
file. Though the module was indeed hidden as
module avail
confirmed, and though there was a cpe/21.04.lua
file earlier in
the MODULEPATH, module load cpe/21.04
still loaded the wrong version, even with
LMOD_IGNORE_CACHE=1
.
A clue to what is happening is that when the line that hides the module in modulerc.lua
is removed, you will notice that LMOD marks the one in the Cray core/cpe
subdirectory
as the default and not the one with the same name earlier in the MODULEPATH.
The problem is that
-
When a full name is used, LMOD can still load a hidden module and does so when that module is marked as the default.
-
The CPE does mark one of the cpe modules as the default using a
.version
file in thecore/cpe
subdirectory.
Luckily HPE-Cray uses a .version
file which has the lowest priority. The solution
is to mark a module in the overwrite directory as the default using either the symlink
method or a .modulerc.lua
file in that directory.
A change of visibility of modules is only detected when the module command exits¶
The idea was to write a module cpe/restore-defaults
to restore the defaults rather
than calling a script. This didn't entirely work as expected, likely because LMOD only
assesses visibility at the start of the execution. Whereas changing MODULEPATH
triggers
LMOD to reassess the module tree, making changes to LMOD_MODULERCFILE
doesn't trigger
reassessing the visibility.
To recreate the experiment, take any standard cpe
modulefile that doesn't only use those modules
with the highest version number, e.g., one that uses an older version of gcc, ( and
with the setenv
replaced with append_path
or prepend_path
see the issues with the CPE),
and the following cpe/restore-defaults.lua
file:
modules = {
"PrgEnv-aocc",
"PrgEnv-cray",
"PrgEnv-gnu",
"PrgEnv-intel",
"PrgEnv-nvidia",
"aocc",
"atp",
"cce",
"cray-R",
"cray-ccdb",
"cray-cti",
"cray-dsmml",
"cray-fftw",
"cray-hdf5",
"cray-hdf5-parallel",
"cray-jemalloc",
"cray-libsci",
"cray-mpich",
"cray-netcdf",
"cray-netcdf-hdf5parallel",
"cray-openshmemx",
"cray-parallel-netcdf",
"cray-pmi",
"cray-pmi-lib",
"cray-python",
"cray-stat",
"craype",
"craype-dl-plugin-py3",
"craypkg-gen",
"gcc",
"gdb4hpc",
"iobuf",
"modules",
"nvidia",
"papi",
"perftools-base",
}
if (mode() == "load" or mode() == "show") then
for _,mod in pairs(modules)
do
if (isloaded(mod)) then
unload(mod)
load(mod)
end
end
end
module load cpe/21.04
module load cpe/restore-defaults
module load cpe/21.04
module unload cpe/21.04
module load cpe/restore-defaults
cpe/restore-defaults
triggers the unloading of cpe/21.04
.
In the former, lower versions of modules will remain loaded. However, loading
cpe/restore-defaults
a second time to correct the situation, i.e.,
module load cpe/21.04
module load cpe/restore-defaults
module load cpe/restore-defaults