High performance computing in Python

Course/Event Essentials

Event/Course Start
Event/Course End
Event/Course Format
Online
Live (synchronous)

Venue Information

Country: Slovakia
Venue Details: Click here

Training Content and Scope

Scientific Domain
Level of Instruction
Intermediate
Advanced
Sector of the Target Audience
Research and Academia
Industry
Public Sector
Other (general public...)
HPC Profile of Target Audience
Application Users
Application Developers
Data Scientists
System Administrators
Language of Instruction

Other Information

Organiser
Supporting Project(s)
EuroCC2/CASTIEL2
Event/Course Description

This course provides an introduction to high performance computing in Python, focusing on the use of Polars and Apache Spark. During this two-day course, you will gain both theoretical knowledge and hands-on experience with these technologies, enabling you to efficiently process and analyse large amounts of data. The Microsoft Windows operating system is used throughout the course.

Day 1: Polars

1. Introduction to High Performance Computing
Basic terms and concepts of high performance computing
Comparison of traditional tools with modern technologies such as Polars
2. Introduction to Polars
Introduction to the Polars library and its benefits
Installation and environment setup
3. Working with data in polars
Loading and saving data
Manipulating data frames
Filtering and transformations
4. Data analysis in polars
Aggregation functions
Merge and join data frames
Optimising performance in polars

Day 2: Apache Spark

1. Introduction to Apache Spark
Introduction to Apache Spark and its benefits
Installing and setting up the environment
2. Working with data in Apache Spark
Loading and storing data
RDDs and DataFrames
Basic transformations and actions
3. Data analysis in Apache Spark
Aggregation operations
Working with large data sets