Data Analytics, Big Data & AI Training Week

Course/Event Essentials

Event/Course Start
Event/Course End
Event/Course Format
Mixed
Live (synchronous)

Venue Information

Country: Germany
Venue Details: Click here

Training Content and Scope

Scientific Domain
Level of Instruction
Beginner
Intermediate
Sector of the Target Audience
Research and Academia
Language of Instruction

Other Information

Organiser
Event/Course Description

This course series for academic participants from Germany will be organised as a hybrid event with the possibility to attend at LRZ in Garching near Munich or online. 

This course series on Data Analytics, Big Data & AI Training offers the following course modules which build on each other and can be selected individually during registration depending on the previous knowledge and experience of the participants.

 

HYBRID SESSION
Module: Introduction to GNU/Linux and the Shell
Date: 10.10.2022, 09:00-12:30 CEST
Lecturers: Dr. Johannes Albert-von der Gönna (LRZ), Maja Piskac (LRZ)

This course module provides an introduction to GNU/Linux and the Unix Shell. GNU/Linux is a family of open source operating systems, powering all different kinds of computing hardware: wearable and mobile devices, desktop and notebook computers, the majority of web servers and cloud instances as well as most high performance computing clusters and supercomputes. The typical command line interface is a Unix-like shell. It serves as interactive command environment and scripting language, allowing users to control the system and to automate tasks of varying complexity.

The course module opens with a short historical overview of GNU/Linux and some common concepts and terminology will be explained. Then the focus is directed toward working with the Unix Shell. First, it will be used to navigate the file system and directories of a system, then the mechanisms of file manipulation and ownership will be explored. This is followed by a presentation of additional useful commands and concepts, as well as a discussion of the characteristics of the shell environment.

This material will be presented as a combination of lectures, demos and hands-on sessions, with a focus on the latter. There will be breaks during the session.

Participants will gain essential knowledge and skills necessary to successfully interact with the command line interface of a GNU/Linux system, a basic requirement when using LRZ cluster systems and cloud infrastructure for their own projects.

 

HYBRID SESSION
Module: Introduction to SSH - Q&A
Date: 10.10.2022, 13:30-17:00 CEST
Lecturers: Dr. Martin Ohlerich (LRZ), Dr. Johannes Albert-von der Gönna (LRZ)

This course module provides an introduction to working on remote systems using the Secure Shell (SSH). SSH is a cryptographic network protocol which is typically used to login and execute commands on remote (GNU/Linux) systems.

The course consists of two parts – a self-study tutorial (to be worked with before the course week), and a Q&A session (this module).

The tutorial will guide participants through installing and configuring a SSH client on their own systems. A conceptual and practical introduction to SSH keys will be given, followed by a discussion of more advanced features like SSH tunneling and custom configuration options. Some applications for remote access and file transfer will be introduced. This material will be presented as a combination of lectures, demos and hands-on sessions, with a focus on the latter. The tutorial material will be made available before the course. It is expected that participants familiarize themselves with this material before the actual course Q&A session.

The Q&A session is then meant to provide the time to practice, repeat and deepen the presented workflows. Specifically, there will be time for troubleshooting, questions, and discussions – also for more advanced topics, if desired. There will be breaks during the session.

Participants will gain essential knowledge and skills necessary to use SSH for their later work, which comprises the connection to and interaction with different systems of the LRZ cluster systems and cloud infrastructure.

 

HYBRID SESSION
Module: Introduction to Multiuser Cluster Systems at LRZ
Date: 11.10.2022, 09:00-12:30 CEST
Lecturers: Florent Dufour (LRZ),Dr. Johannes Albert-von der Gönna (LRZ)

It has been about 350 years that separate the original 1673 Leibniz mechanical calculator from today's Leibniz Supercomputing Centre's (LRZ) facilities located on the Campus Garching. And yet, the spirit has not changed. To quote the German mathematician:

"It is beneath the dignity of excellent men to waste their time in calculation when any peasant could do the work just as accurately with the aid of a machine."

This course module will allow participants to live up to their dignity by providing a comprehensive walkthrough and usage guide to such contemporary types of these machines that potentially fill whole buildings.
In a general overview, historical and current developments and trends in the space of scientific computing and cluster systems will be presented. This will, amongst others, address the following questions: How do modern cluster systems work and how are they architected? How did we come to High Performance Computing, High Performance Data Analytics and High Performance AI? What makes a system adequate to a specific workload? How are these systems operated and how are they made available to their users?
In addition, typical interaction methods and usage patterns will be covered, including various possibilites of setting up user environments (e.g. environment variables and modules, user space package managers, containers) as well as tools for resource allocation (i.e. Slurm Workload Manager) and efficient parallelization (MPI, OpenMP, ...). Finally, an overview of different compute clusters as well as their background storage systems operated by LRZ will be provided. The requirements for acquiring access to these systems will be covered as well.

Participants will gain a good understanding of the characteristics of multiuser cluster systems in general and will practise basic methods of typical interaction. They will familiarize themselves with the landscape of cluster systems available at LRZ and this will allow them to choose the right system for their own compute projects.

Prerequisites:

  • Module: Introduction to GNU/Linux and the Shell (or comparable previous knowledge)
  • Module: Introduction to SSH - Q&A (or comparable previous knowledge)

 

HYBRID SESSION
Module: Introduction to Container Technology & Application to AI at LRZ
Date: 11.10.2022, 13:30-17:00 CEST
Lecturers: Florent Dufour (LRZ), PD Dr. Juan Durillo Barrionuevo (LRZ)

Since the introduction of docker back in 2013, container technology has become the industry standard for software packaging, distribution, and deployment.

Creating a container consists in bundling an application, its dependencies and runtime in a single unit that can later run independently of the underlying infrastructure. Unlike virtual machines, containers are lightweight and yield higher performances while providing greater versatility and interoperability. As containers accommodate an easy, safe, reliable, and scalable way to run applications and pipelines, they are an attractive candidate for high performance computing and artificial intelligence workloads.

With this module, we will showcase the most enticing features and niceties offered by containers. Not only will we explore their history and implementations, but we will also dive into actual and cutting edge uses with a particular emphasis on artificial intelligence tasks, reproducible biomedical pipelines, and automated workflows.

Participants will roll up their sleeves and get their hands on the LRZ Compute Cloud to set containers in action. By the end of the course, participants will be able to transfer their experience and knowledge to their specific use-cases and requirements.

Prerequisites:

  • Module: Introduction to GNU/Linux and the Shell (or comparable previous knowledge)
  • Module: Introduction to SHH - Q&A (or comparable previous knowledge)

 

HYBRID SESSION
Module: Introduction to the LRZ AI Systems
Date: 12.10.2022, 09:00-12:30 CEST
Lecturers: Maja Piskac (LRZ), PD Dr. Juan Durillo Barrionuevo (LRZ)

The aim of this course module is to give an overview of the LRZ AI Systems, and provide participants with the knowledge and skills necessary to efficiently utilise them. The course module consists of mini lectures, demos and hands on sessions (breaks included) covering the following topics:

  • Introduction to the LRZ AI Resources

    • Overview of the LRZ AI Resources
    • Access to the LRZ AI Resources
    • Introduction to Enroot
  • Fundamentals of Deep Learning

    • Introduction to the Neural Networks
    • Introduction to the Convolutional Neural Networks
    • Train a Convolutional Neural Network on a GPU
  • Distributed Training of Neural Networks

    • Introduction to Data parallel training

    • Data parallel training of a Convolutional Neural Network on 2 GPUs
    • Introduction to Model parallel training - Pipeline parallel and Tensor parallel

    • Pipeline parallel training of a Convolutional Neural Network on 2 GPUs

Prerequisites:

  • Module: Introduction to GNU/Linux and the Shell (or comparable previous knowledge)
  • Module: Introduction to SHH - Q&A (or comparable previous knowledge)
  • Module: Introduction to Multiuser Cluster Systems at LRZ (or comparable previous knowledge)
  • Module: Introduction to Container Technology & Application to AI at LRZ (or comparable previous knowledge)

 

ONLINE ONLY
Module: Introduction to the LRZ Compute Cloud
Date: 12.10.2022, 13:30-17:00 CEST
Lecturers: PD Dr. Juan Durillo Barrionuevo (LRZ), Florent Dufour (LRZ)

In this course module an overview of the LRZ Compute Cloud (CC) will be provided. The CC is an Infrastructure as a Service (IaaS) cloud operated by the Leibniz Supercomputing Centre (LRZ). 

The Compute Cloud will be introduced as general purpose and flexible infrastructure. A brief overview of the fundamentals of cloud computing will be followed by an introduction to OpenStack, the "Operating System" of the cloud. The use of the CC via web interface and command line will be demonstrated.  

Participants will set their own Jupyter Lab servers on the Compute Cloud and will execute some light ML inference workloads.  This will provide participants with the knowledge and skills necessary to efficiently utilize the LRZ Compute Cloud infrastructure for their own projects. The covered topics are 

  • Fundamentals of cloud computing and Infrastructure as a Service (IaaS) clouds 
  • Overview of the hardware of the LRZ Compute Cloud 
  • Using the LRZ Compute Cloud via the web Interface 
  • Using the LRZ Compute Cloud via the command line 

Prerequisites

  • Module: Introduction to GNU/Linux and the Shell (or comparable previous knowledge) 
  • Module: Introduction to SSH (or comparable previous knowledge) 

 

HYBRID SESSION
Intel® AI Workshop Module #1: Accelerated Machine Learning with Intel®
Date: 13.10.2022, 09:00-12:15 CEST
Lecturers: Roy Allela (Intel), Tobias Andreasen (SigOpt/Intel), Dr. Séverine Habert (Intel), Walter Riviera (Intel)

This workshop session lead by Intel® experts will feature sessions covering the following topics, Intel® tools & technologies:

  • Hardware acceleration for AI and Intel® oneAPI AI Analytics Toolkit: In this session, we will first introduce the hardware features that are powering AI on Intel, we will then get a first glance at the software stack harnessing them, namely the Intel® oneAPI AI Analytics Toolkit.
  • How to accelerate Classical Machine Learning on Intel Architecture: In this session, we will cover the Intel-optimized libraries for Machine Learning. Python* is currently ranked as the most popular programming language and is widely used in Data Science and Machine Learning. We will begin by covering the Intel® Distribution for Python and its optimizations. We will then cover the optimizations for ML Python packages such as Modin, Intel® Extension for Scikit-learn and XGBoost. The presentations will be accompanied with demos to showcase the performance speedup
  • Enhance your Experimentation with SigOpt: Modeling is a scientific process that requires experimentation to get right. But experimentation is only as effective as the tools applied to it. SigOpt is an Intelligent Experimentation platform that empowers AI modelers to design experiments by asking the right questions, explore experiments to understand their modeling problems, and optimize their experiments to get the best results.
  • Federated Learning: When it comes to AI, we can’t really address the topic without talking about Data. Nowadays, the demand for data has increased the needs to collect fresh information to feed bigger and more ambitious AI models. However, while data is being collected at different levels, accessing it might not always be possible due to physical constraints (i.e. remote locations) or regulations in place (i.e. GDPR, HIPAA, POPIA). In this session we’ll learn what is Federated Learning and how we can build a real federation that is able to leverage distributed dataset to train a shared model and solve the data access problem

 

HYBRID SESSION
Intel® AI Workshop Module #2: Accelerated Deep Learning with Intel®
Date: 13.10.2022, 13:30-17:00 CEST
Lecturers: Akash Dhamasia (Intel), Dr. Séverine Habert (Intel), Vladimir Kilyazov (Intel), Dr. Massoud Rezavand (Intel), Dr. Nikolai Solmsdorf (Intel)

This workshop lead by Intel® experts will feature sessions covering the following topics, Intel® tools & technologies:

  • Optimize Deep Learning on Intel – Same code just faster! In this session, we present to you what is behind the scenes of Deep Learning with the highly-optimized Intel® oneDNN library in order to get the best-in-class performance on Intel hardware. We then show you Intel® oneDNN in action in DL frameworks such as the Intel-optimized TensorFlow, Intel-optimized PyTorch and the Intel® Extension for PyTorch (IPEX).
  • AI-driven multiphysics HPC applications on Intel architecture - Bridging the gap between HPC and ML: A major challenge in HPC is to make use of and understand the massive amounts of data that are being produced when running numerical simulations. For ML on the other hand, the challenge is to have access to enough data so that we have the confidence that our models truly understand the world. Therefore, researchers are looking to replace components of HPC applications with ML models to (a) reduce the need for data storage, (b) accelerate the simulations by ML models to capture longer timescales, and (c) achieve accurate simulations in some problems that the classical solvers are not applicable to. In this session we present this interdisciplinary field and highlight recent achievements on Intel® architectures.
  • Introduction to Neural Network Compression Techniques: In this session, we will explain various network compression techniques in Deep Learning—such as quantization, pruning, and knowledge distillation—, their benefits in terms of performance speed-up, and finally we will showcase you the Intel tools that help you compress your model, like the Intel® Neural Compressor.
  • Easily speed up Deep Learning inference – Write once deploy anywhere! In this session, we will showcase the Intel® Distribution of OpenVINO™ Toolkit that allows you to optimize for high-performance inference models that you trained with TensorFlow* or with PyTorch*. We will demonstrate how to use it to write once and deploy on multiple Intel hardware.
  • Uncertainty estimation: In this session, we will talk about the limitations of conventional deep learning techniques such as being not explainable, overconfident, and being susceptible to adversarial attacks and why in safety critical applications, it is important to incorporate reliable uncertainty estimation to DNNs for trustworthy and informed decision making.

 

ONLINE ONLY
Module: Introduction to the LRZ Linux Cluster
Date: 14.10.2022, 09:00-12:30 CEST
Lecturers: Dr. Johannes Albert-von der Gönna (LRZ), Dr. Martin Ohlerich (LRZ)

In this course module, a closer look at the LRZ Linux Cluster, a high performance computing (HPC) cluster system operated by Leibniz Supercomputing Centre (LRZ), will be provided.

Firstly, an overview of the different compute and storage components that constitute the LRZ Linux Cluster will be given. Many of these will be explored in a dedicated hands-on session which will cover the characteristics of the system, including details of the environment module system, the Spack package manager (used to provide most user-facing software) as well as the Slurm Workload Manager job scheduling software.

The material will be presented as a combination of lectures, demos and hands-on sessions, with a focus on the latter. There will be breaks during the session.

Participants will gain the general understanding and skills necessary to efficiently utilize the LRZ supercomputing infrastructure for their own projects, running compute jobs on LRZ HPC systems.

Prerequisites:

  • Module: Introduction to GNU/Linux and the Shell (or comparable previous knowledge)
  • Module: Introduction to SSH (or comparable previous knowledge)
  • Module: Introduction to Multiuser Cluster Systems at LRZ (or comparable previous knowledge)

 

ONLINE ONLY
Module: High Performance Data Analytics Using R at LRZ
Date: 14.10.2022, 13:30-17:00 CEST
Lecturers: Dr. Johannes Albert-von der Gönna (LRZ), Dr. Martin Ohlerich (LRZ)

R is a highly popular and powerful programming language for data analysis and graphics, used in many research domains. The Leibniz Supercomputing Centre (LRZ) is addressing the needs of R users by facilitating various ways of working with R on LRZ systems.

R can be employed on the majority of LRZ compute systems like the HPC systems Linux Cluster and SuperMUC-NG as well as the AI Systems for data analytics and deep learning. Additionally, the use of RStudio Server IDE environments is facilitated, which provide a powerful interactive data analytics platform familiar to many R users.

In this course, the different possibilities of using R at LRZ for high performance data analytics and machine learning projects will be demonstrated and excercised in hands-on sessions. Guidelines and best practice examples for running R applications on the various systems will be provided. Special attention will be paid to different ways of parallelizing R code in order to utilize LRZ's HPC & AI Systems infrastructure.

Prerequisites:

  • Basic knowledge of R
  • Module: Introduction to GNU/Linux and the Shell (or comparable previous knowledge)
  • Module: Introduction to SSH (or comparable previous knowledge)
  • Module: Introduction to Multiuser Cluster Systems at LRZ (or comparable previous knowledge)
  • Module: Introduction to Container Technology & Application to AI at LRZ (or comparable previous knowledge)
  • Module: Introduction to the LRZ AI Systems (or comparable previous knowledge)
  • Module: Introduction to the LRZ Linux Cluster (or comparable previous knowledge)