Since 2014, the scientific software stack on the ULHPC facility is generated and deployed in an automated and consistent way through the RESIF framework (Revolutionary EB-based Software Installation Framework), a wrapper on top of Easybuild and Lmod meant to efficiently handle user software generation. The main objectives of this project was to fully automate software builds and to supports all available toolchains and software sets through a clean hierarchical modules layout to facilitate its usage and provide an intuitive interface to the users. I also wanted to facilitate the reproducible and self-contained deployment of the complete software stack, coupled with a strong versioning policy between environments and (typically) yearly release cycles.

  • The first version was the result of a master project I proposed to Maxime Schmitt in 2014-2015. It was used to produce the following ULHPC software environments:
  • A large code refactoring (bringing RESIF 2) was performed in 2017 to better handle different software sets and roles across multiple clusters, all piloted through a dedicated control repository. Sarah Peter and Valentin Plugaru were mainly taking care of the updates at this level to produce the following ULHPC software environments:

Yet after these 3 last environment releases, the limitations induced by RESIF 2 were clear and the corresponding workflow proved to be quite complex and hard to maintain. Furthermore, the broken compliance with streamline EasyBuild developments led to an explosion of custom configurations.

With the advent of the new Aion supercomputer featuring a different CPU architecture (AMD Epyc instead of Intel Broadwell/Skylake), and to mitigate the identified limitations, I wanted to rethink completely the framework.
This led to a complete code refactoring leading to the RESIF 3.0 framework presented in [1] at the occasion of the ACM PEARC’21 conference.

  1. S. Varrette, E. Kieffer, F. Pinel, E. Krishnasamy, S. Peter, H. Cartiaux, and X. Besseron, “RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility,” in ACM Practice and Experience in Advanced Research Computing (PEARC’21), Virtual Event, 2021.
    URL

   RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility

Validated against the 2019b toolchain with the ULHPC team, it enables the User Software Environment on ULHPC systems for now. it follows that the ULHPC software modules are structured according to the organization depicted below (click to enlarge) through Module bundles (i.e., using the Bundle easyblock, or the Toolchain one (derived from the Bundle one) for the ULHPC environment.

The bundles permits to define the ULHPC Toolchains, programming languages and compilers for each release – see also the ULHPC Technical Documentation. Example:

Name Type 2019[a] (deprecated) 2019b (old) 2020a (prod) 2021a* (devel)
GCCCore compiler 8.2.0 8.3.0 9.3.0 10.3.0
foss toolchain 2019a 2019b 2020a 2021a
intel toolchain 2019a 2019b 2020a 2021a
binutils   2.31.1 2.32 2.34 2.36
Python   3.7.2 (and 2.7.15) 3.7.4 (and 2.7.16) 3.8.2 (and 2.7.18) 3.9.2
LLVM compiler 8.0.0 9.0.1 10.0.1 11.1.0
OpenMPI MPI 3.1.4 3.1.4 4.0.3 4.1.1

_*: projections at the time of writing _

The code base is available publicly on Github – see ULHPC/sw. It is synchronized from our internal repository piloting the deployment. This tool may thus help other HPC centres to consolidate their own software management stack.

   RESIF 3 codebase on Github