Reproducible Software Environments & Benchmarks with Ansible and Spack: HiDALGO approach

Course/Event Essentials

Event/Course Start
Event/Course End
Event/Course Format
Online
Live (synchronous)

Venue Information

Country: Germany
Venue Details: Click here

Training Content and Scope

Scientific Domain
Level of Instruction
Beginner
Other
Sector of the Target Audience
Research and Academia
Industry
Public Sector
HPC Profile of Target Audience
Application Users
Application Developers
Data Scientists
System Administrators
Language of Instruction

Other Information

Organiser
Supporting Project(s)
HiDALGO
Event/Course Description

A talk will be given by Sergiy Gogolenko at HLRS (HiDALGO partner) on simplifying user software installation on versatile clusters/testbeds, as well as creating reproducible software environments and benchmarks with Spack and Ansible for HiDALGO and HLRS-SANE teams.

Abstract:

In recent decades, there is a tendency to pay solid attention to reproducibility of experiments in science. In many cases, reproducibility becomes one of the corner-stone measures (or even pre-requisite for publication) of the high quality research. In terms of HPC, reproducibility of benchmarks presumes exhaustive reports of the hardware and software setup, as well as an easy and consistent way to replicate the software environment in use.

Unfortunately, the users of HPC clusters and testbeds face with a number of obstacles on the way to prepare reproducible software environments. In particular, each data center provides its own subset of pre-installed software and has a set of specific policies and restrictions (like lack of access to internet). Moreover, in order to reach better performance, the end-user usually must install the software from sources watching all its dependencies, which demands an unnecessarily high level of technical expertise. As a results, installation often becomes a tedious time-consuming process discouraging people from taking care of software reproducibility particularly and from HPC generally.

Spack is a HPC-oriented package manager, which solves most of the above-mentioned problems if configured correctly, while Ansible addresses the issue of deploying and configuring Spack with off-line mirrors for the end-users. In addition, Ansible and Spack feature deep integration of YAML and JSON formats along with a high extensibility via a wide range of builtin and external modules. It makes these tools a perfect combination not only for automated fully-reproducible software installation, but also for documenting hardware and software configurations along with the installation process uniformly in a human readable format easy to post-process.

The purpose of this talk is largely to teach basics of configuring and using Spack for the end-users of HPC systems, as well as to introduce an Ansible+Spack solution which we use for automated installation of the same software on a variety of clusters.

Tentative agenda:

- Introduction to Spack and Ansible (5min)
- Spack from user prospective: basic usage (10min)
- Configuring Spack (15 min)
- Deployment of Spack on versatile hardware with Ansible (10min)
- Reproducible benchmarks and their reporting with Ansible (10min)

- Reproducible benchmarks and their reporting with Ansible (10min)
- Q&A (10min)