Skip to content

[package list]

R

License information

R itself is available as free software under the GNU General Public License, see the "COPYING" page on the R web site.

Several of the EasyConfigs provide additional packages from CRAN and other sources that may be licensed differently though. It is the User's responsibility to ensure that all packages can be legally used by them as some may have liceses that restrict use.

User documentation

We do provide some EasyBuild recipes for a plain R installation with only the standard packages and Rmpi (the -raw versions, Rmpi provided as it needs special options to install on LUMI), and a regular version with a lot of packages for parallel computing already included.

Known restrictions

It should be said that R was never developed for parallel computing. Parallel computing is only added through a mess of packages and not an intrinsic part of the language. Moreover the packages for parallel computing often evolved from multicore computing on a workstation, or some disctributed computing on a network of workstations not managed by a scheduler. As a result some packages are not fully functional on LUMI.

Some known restrictions:

  • Rmpi: mpi.spawn.Rslaves is not suppoted as Cray MPI does not support MPI_Comm_spawn.

  • parallel: detectCores detects the total number of (virtual) cores, not the number of cores available to the application. This happens with both Cray R and R built with EasyBuild.

    A solution is to use the availableCores() function from the parallelly package. That package can, e.g., recognize the CPU set a program will be running in when started through srun.

    As a result of packages not being sophisticated enough to recognise they are not running on the full node, it can be expected that some packages using shared memory multiprocessing will launch one thread per virtual core rather one thread per available core, which can lead to heavy oversubscription of the cores in your job and very bad parallel performance.

  • We have seen issues with some linear algebra operations when running in multithreaded mode. Two workarounds seem to help:

    • Setting OMP_STACKSIZE=256M (and exporting this variable) which we have implemented in the module for the 24.03 versions, and

    • using ulimit -s 300000

    The problem has been reported to HPE Cray. It might be fixed in the 25.03 release of the programming environment.

    Note that the issue also occurs with the cray-R module, but there you'll have to also set OMP_STACKSIZE by hand as we cannot change those modules.

Some noteworthy packages included in the regular EasyConfig

  • Shared memory parallel computing

    • parallelly, with improvements over the system parallel package that are more aware of the typical environment on an HPC cluster.
  • Distributed memory computing:

    • Rmpi, build to interface properly with Cray MPICH.

    • snow, Simple Network Of Workstations.

    • snowfall

  • Unified shared and distributed memory computing:

  • Interfacing with Slurm

Some examples

These examples are inspired by the examples in the LRZ documentation, section "Parallelization Using R"

A shared memory (multicore) computing example with the parallel and parallelly packages

Example R script that you can store in script_parallel.R:

library(parallel)
library(parallelly)

ac <- detectCores()
sprintf( "Number of cores according to parallel::detectCores: %d", ac )
nc <- availableCores()
sprintf( "Number of cores according to parallelly::availableCores: %d", nc )

print( "lapply:" )
system.time(
    lapply(1:20, function(x) sum(sort(runif(1e7))))
)

print( "mclapply with mc.cores = 1:" )
system.time(
    mclapply(1:20, function(x) sum(sort(runif(1e7))), mc.cores = 1)
)

sprintf( "mclapply with mc.cores = %d:", nc )
system.time(
    mclapply(1:20, function(x) sum(sort(runif(1e7))), mc.cores = nc)
)

print( "mclapply without mc.cores argument:" )
system.time(
    mclapply(1:20, function(x) sum(sort(runif(1e7))))
)
print( "Not sure what is happening here as user time goes down but elapsed goes up." )

Example submit script to store in submit.slurm:

#!/bin/bash
#SBATCH --job-name=R_parallel_test
#SBATCH --partition=small
#SBATCH --time=2:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --output=%x_%j.txt
#SBATCH --account=project_46YXXXXXX

module load LUMI/23.09 partition/C R/4.3.2-cpeGNU-23.09

echo -e "Running Rscript from $(which Rscript).'\n"

srun Rscript script_parallel.R

The srun command doesn't seem to make a difference here but it usually ensures that all Slurm flags, including hints for multithreading, are correctly applied to the executable that is starting.

Note the difference between the detectCores routine from parallel and the availableCores routine from parallelly. The former fails to correctly detect the resources available for R while the latter does detect that there are only 8 CPUs available.

This script was written by someone who is not an expert to to whom it is completely unclear what is happening in the last call to mclapply as the user time goes doen while elapsed goes up.

A distributed memory computing example with the snow package

Example R script script_snow.R:

library(snow)

cl <- makeCluster()

system.time(
    parLapply(cl, 1:167, function(x){
        sum(sort(runif(1e7)))
        }
    )
)

stopCluster(cl)

q(save="no")

Example submit script submit.slurm:

#!/usr/bin/bash
#SBATCH --job-name=snow_test
#SBATCH --partition=small
#SBATCH --time=5:00
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread
#SBATCH --output=%x_%j.txt
#SBATCH --account=project_46YXXXXXX

module load LUMI/23.09 partition/C
module load R/4.3.2-cpeGNU-23.09

echo -e "\\n# Running on 2 task\n\n"
srun -W 10 -n 2 $EBROOTR/lib/R/library/snow/RMPISNOW <script_snow.R

echo -e "\\n# Running on 3 task\n\n"
srun -W 10 -n 3 $EBROOTR/lib/R/library/snow/RMPISNOW <script_snow.R

echo -e "\\n# Running on 4 tasks\n\n"
srun -W 10 -n 4 $EBROOTR/lib/R/library/snow/RMPISNOW <script_snow.R

echo -e "\\n# Running on $((SLURM_NTASKS + 1)) tasks\n\n"
srun -W 10 -n $((SLURM_NTASKS + 1)) -O $EBROOTR/lib/R/library/snow/RMPISNOW <script_snow.R

The -W option ensures that the srun command will end 10 seconds after the first process ends as an additional precaution.

Note that we use the EBROOTR environment variable for the path to the RMPISNOW script which is needed to run this example. This variable is defined by all EasyBuild-installed R modules so one does not need to adapt the line when switching to a newer version of R. The RMPISNOW command takes care of some initialisations in particular when the job spans multiple nodes.

There is another important thing to note in this example. If you analyse the timings in the results carefully you will see that the 3-task case runs in half the time of the 2-task case and the 4-task case in little over one third of the time of the 2-task case. This is because one of the MPI tasks is used internally as the master process and is in fact not doing much while the remaining tasks are used to build the cluster and do the computations in the parLapply call. This is solved with the last srun call: we start one more task than requested in the #SBATCH lines which we can do by adding the -O or --overcommit flag to the srun command. We now get a speedup of about 4 compared to the timing we got with 2 tasks which effectively was running the sequentially as there was a master and one slave.

Examples with foreach

foreach for shared memory (multi-core) computation

For this example, we combine the foreach package with the doParallel adapter package and with parallelly to determine the number of available cores.

The R script script_foreach_MC.R is:

library(foreach)
library(doParallel)
library(parallelly)

sprintf( "Running on %d core(s).", availableCores())
registerDoParallel(cores = availableCores())

system.time(
    foreach(i = 1:100) %dopar% sum(sort(runif(1e7)))  # parallel execution
)

The corresponding job script submit.slurm is

#!/bin/bash
#SBATCH --job-name=R_foreach_doParallel_test
#SBATCH --partition=standard
#SBATCH --time=2:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --hint=nomultithread
#SBATCH --output=%x_%j.txt
#SBATCH --account=project_46YXXXXXX

module load LUMI/23.09 partition/C R/4.3.2-cpeGNU-23.09

echo -e "Running Rscript from $(which Rscript).'\n"

echo -e "\n\nStarting on 1 core.\n\n"
srun -n 1 -c 1 Rscript script_foreach_MC.R

echo -e "\n\nStarting on 4 cores.\n\n"
srun -n 1 -c 4 Rscript script_foreach_MC.R

The first srun command starts the example on a single core, so it effectively runs in a serial way, while the second srun command uses 4 cores. When you inspect the results you'll notice that the user and system time in the second case are a bit higher which is normal as parallel computing alsways comes with some overhead, but from the elapsed time we get a speedup of 3.4.

foreach with the doSNOW adapter for distributed memory computing

The R script script_foreach_SNOW.R is now

library(foreach)
library(doSNOW)

cl <- makeCluster()
registerDoSNOW(cl)

system.time(
    foreach(i = 1:100) %dopar% sum(sort(runif(1e7)))  # parallel execution
)

stopCluster(cl)

The foreach line is the same as before, but the setup is different. This implies that the core of the code actually remains the same as before.

The script is started with the jobscript submit.slurm:

#!/usr/bin/bash
#SBATCH --job-name=R_foreach_SNOW_test
#SBATCH --partition=small
#SBATCH --time=5:00
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread
#SBATCH --output=%x_%j.txt
#SBATCH --account=project_46YXXXXXX

module load LUMI/23.09 partition/C
module load R/4.3.2-cpeGNU-23.09

echo -e "\\n# Running on 2 task\n\n"
srun -W 10 -n 2 $EBROOTR/lib/R/library/snow/RMPISNOW <script_foreach_SNOW.R

echo -e "\\n# Running on 3 task\n\n"
srun -W 10 -n 3 $EBROOTR/lib/R/library/snow/RMPISNOW <script_foreach_SNOW.R

echo -e "\\n# Running on 4 tasks\n\n"
srun -W 10 -n 4 $EBROOTR/lib/R/library/snow/RMPISNOW <script_foreach_SNOW.R

echo -e "\\n# Running on $((SLURM_NTASKS + 1)) tasks\n\n"
srun -W 10 -n $((SLURM_NTASKS + 1)) -O $EBROOTR/lib/R/library/snow/RMPISNOW <script_foreach_SNOW.R

The user and system time is rather irrelevant now as that is for the master process. That it is still rather high is likely due to a busy waiting strategy. The elapsed time shows again the behaviour we've seen before already, with the time for 2 tasks the time for serial execution of the foreach function.

foreach with the doMPI adapter for distributed memory computing

The R script script_foreach_MPI.R is now

library(foreach)
library(doMPI)

cl <- startMPIcluster()  # use verbose = TRUE for detailed worker message output
registerDoMPI(cl)

system.time(
    foreach(i = 1:100) %dopar% sum(sort(runif(1e7)))  # parallel execution
)

closeCluster(cl)
mpi.quit()

The foreach line is again the same as before, but the setup is different.

The script is started with the jobscript submit.slurm:

#!/usr/bin/bash
#SBATCH --job-name=R_foreach_SNOW_test
#SBATCH --partition=small
#SBATCH --time=5:00
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread
#SBATCH --output=%x_%j.txt
#SBATCH --account=project_46YXXXXXX

module load LUMI/23.09 partition/C
module load R/4.3.2-cpeGNU-23.09

echo -e "\\n# Running on 2 task\n\n"
srun -n 2 Rscript script_foreach_MPI.R

echo -e "\\n# Running on 3 task\n\n"
srun -n 3 Rscript script_foreach_MPI.R

echo -e "\\n# Running on 4 task\n\n"
srun -n 4 Rscript script_foreach_MPI.R

echo -e "\\n# Running on $((SLURM_NTASKS + 1)) tasks\n\n"
srun -n $((SLURM_NTASKS + 1)) -O Rscript script_foreach_MPI.R

It appears that here also one process is used as the master in the background while the other processes execute the foreach commands. With this adapter we can actually start the script in the way we are used to start parallel programs, but we still need to oversubscribe (because of the master process) to get the best speedup from using 4 tasks (and hence 4 cores).

Information on the web

User-installable modules (and EasyConfigs)

Install with the EasyBuild-user module:

eb <easyconfig> -r
To access module help after installation and get reminded for which stacks and partitions the module is installed, use module spider R/<version>.

EasyConfig:

Technical documentation

This module may be a replacement for some users for the HPE-provided version in cray-R.

The -raw-version of the EasyConfigs is for R without any additional packages, while the other EasyConfigs offer examples of how to add packages.

Some notes

  • bin/R is really only a script that starts R. The relevant executable is in lib64/R/bin/exec.

EasyBuild

R packages considered for inclusion

Version 4.2.1 for cpeGNU 22.06, 22.08

  • Worked from the EasyBuilders EasyConfig but removed all OpenGL-related stuff and also removed all extensions to create the -raw version which has no external packages at all.

  • PROBLEM: Suffers from the multiple Cray LibSci libraries linked in problem, and it is not clear how they come in.

    • The R infoSession() command shows that the non-MPI multithreaded library is used.

    • The configure script does discover the right option to compile with OpenMP.

    • Enforcing the OpenMP compiler flag through toolchainopts though fails. It is not used at link time, resulting in undefined OpenMP symbols when linking.

    SOLUTION: Enforce OpenMP through toolchainopts and manually add -fopenmp to LDFLAGS in preconfigopts.

  • Extensions: List based on the foss-2022a version of R in EasyBuild.

    • Rmpi:

      • Typically uses an EasyBlock, but that EasyBlock does not support the Cray environment so we avoid using it and instead manually build the right --configure-args flag for the R installation command.

      • There is still a problem as when loading Rmpi.so, the load fails because mpi_universe_size cannot be found. Now this is a routine defined in Rmpi itself but only compiles in certain cases. However, it looks like routines that reference to this function are not correctly disabled when this routine is not included in the compilation.

        Now in fact, the configure script does correctly determine that it should add -DMPI2 to the command line. However, it does so after a faulty -I flag that does not contain a directory. Hence the compiler interprets the -DMPI2 flag as the argument of -I instead.

        The solution to this last problem is a bit complicated.

        • The empty -I argument is a bug in the configure script that occurs when --with-Rmpi-include is not used.

        • Adding just any directory with that argument does not work; the configure script checks if it contains mpi.h. We used an environment variable to find the directory that Cray MPI uses.

        • But adding --with-Rmpi-include also requires adding --with-Rmpi-libpath for which we used the same environment variable.

        • So hopefully these options do not conflict with anything the Cray compiler wrappers add. --with-Rmpi-include and --with-Rmpi-libpath really should not be needed when using the Cray wrappers.

Version 4.3.1 for 22.12

  • Based on the 4.2.1 work but now focused on adding packages for parallel computing (and developed the USER.md page explaining options for parallel computing).

  • One way to figure out how to do this is to install the -raw version and then add the desired packages by hand and see what other packages R pulls in and when it does so.

Version 4.3.2 for 23.09

  • Quick port with minor version updates of the 22.12 one.

VErsion 4.4.1 for 24.03

  • Quick port with version updates of the 23.09 EasyConfigs.

  • Changed the naming to stress that we link with the multithreaded BLAS libraries.

Archived EasyConfigs

The EasyConfigs below are additonal easyconfigs that are not directly available on the system for installation. Users are advised to use the newer ones and these archived ones are unsupported. They are still provided as a source of information should you need this, e.g., to understand the configuration that was used for earlier work on the system.