AI Training Series - Introduction to the LRZ Linux Cluster

Course/Event Essentials

Event/Course Start
Event/Course End
Event/Course Format
Mixed
Live (synchronous)

Venue Information

Country: Germany
Venue Details: Click here

Training Content and Scope

Scientific Domain
Level of Instruction
Beginner
Intermediate
Sector of the Target Audience
Research and Academia
Language of Instruction

Other Information

Organiser
Event/Course Description

In this course we will provide an introduction to the Linux Cluster, a high performance computing (HPC) system operated by the Leibniz Supercomputing Centre (LRZ).

First, a very short introduction to HPC systems in general and their characteristics will be provided. Then, the Linux Cluster at LRZ will be presented in detail with a focus on the different compute and storage systems that make up the cluster. Continued exploration will involve dedicated hands-on sessions. These will not only cover the basics (connecting to the cluster, navigating the file system), but also provide a good understanding of the cluster's critical components like the general software environment (including the Spack package manger) as well as the Slurm Workload Manger. Although the Linux Cluster needs to be considered a traditional HPC environment, the session will turn the spotlight on particular needs of the data analytics, machine learning and AI communities. Overall, this will empower participants to succesfully run select data analytics and machine learning jobs on the LRZ Linux Cluster.

The material will be presented as a combination of lectures, demos and hands-on sessions, with a focus on the latter. There will be breaks during the session.

There will be three content blocks of roughly one and a half hour each (B=Beginner's content, I=Intermediate content):

  • Overview of high performance computing systems for users of Bavarian universities (B)
  • HPC basics (system characteristics and cluster topology, levels of parallelism) (B)
  • Criteria and process for system access (B)
  • Data storage options (B)
     
  • Hands-on Linux Cluster: connecting and (file) system overview (B)
  • Environment modules (module list, show, available, load, unload) (B)
  • Software Environment: The Spack package manager (B)/(I)
     
  • Using containers with Charliecloud (B)/(I)
  • Basics of the Slurm Workload Manager and interactive usage (sinfo, salloc, srun) (B)
  • Job scripts and processing (sbatch) (B)/(I)
  • Job queue and accounting (squeue, sacct) (B)/(I)