Course/Event Essentials
Training Content and Scope
Other Information
This course is part of the "LRZ AI Training Series", a series of courses aiming at the needs and expectations of data analytics, big data & AI users at LRZ. While focusing on these particular users and their use cases, this session as well as all other courses offered as part of the AI Training Series are, of course, open to all interested parties.
This course for academic participants from Germany will be organised as a hybrid event with the possibility to attend at LRZ in Garching near Munich or online.
Contents
In this course we will provide an introduction to the Linux Cluster, a high performance computing (HPC) system operated by the Leibniz Supercomputing Centre (LRZ).
First, a very short introduction to HPC systems in general and their characteristics will be provided. Then, the Linux Cluster at LRZ will be presented in detail with a focus on the different compute and storage systems that make up the cluster as well as an outlook on future systems. Continued exploration will involve dedicated hands-on sessions. These will not only cover the basics (connecting to the cluster, navigating the file system), but also provide a good understanding of the cluster's critical components like the general software environment (including the Spack package manger) as well as the Slurm Workload Manger. Although the Linux Cluster needs to be considered a traditional HPC environment, the session will turn the spotlight on particular needs of the data analytics, machine learning and AI communities. Overall, this will empower participants to succesfully run select data analytics and machine learning jobs on the LRZ Linux Cluster.
The material will be presented as a combination of lectures, demos and hands-on sessions, with a focus on the latter. There will be breaks during the session.
There will be three content blocks of roughly one and a half hour each (B=Beginner's, I=Intermediate, A=Advanced content):
- Overview of high performance computing systems for users of Bavarian universities (B)
- HPC basics (system characteristics and cluster topology, levels of parallelism) (B)
- Criteria and process for system access (B)
- Data storage options (B)
- Hands-on Linux Cluster: connecting and (file) system overview (B)
- Environment modules (module list, show, available, load, unload) (B)
- Software Environment: The Spack package manager (B)/(I)
- Using containers on the Linux Cluster (B)/(I)
- Basics of the Slurm Workload Manager and interactive usage (sinfo, salloc, srun) (B)
- Job scripts and processing (sbatch) (B)/(I)
- Job queue and accounting (squeue, sacct) (B)/(I)