Course/Event Essentials
Training Content and Scope
Other Information
This course provides an introduction to high performance computing in Python, focusing on the use of Polars and Apache Spark. During this two-day course, you will gain both theoretical knowledge and hands-on experience with these technologies, enabling you to efficiently process and analyse large amounts of data. The Microsoft Windows operating system is used throughout the course.
Day 1: Polars
1. Introduction to High Performance Computing
Basic terms and concepts of high performance computing
Comparison of traditional tools with modern technologies such as Polars
2. Introduction to Polars
Introduction to the Polars library and its benefits
Installation and environment setup
3. Working with data in polars
Loading and saving data
Manipulating data frames
Filtering and transformations
4. Data analysis in polars
Aggregation functions
Merge and join data frames
Optimising performance in polars
Day 2: Apache Spark
1. Introduction to Apache Spark
Introduction to Apache Spark and its benefits
Installing and setting up the environment
2. Working with data in Apache Spark
Loading and storing data
RDDs and DataFrames
Basic transformations and actions
3. Data analysis in Apache Spark
Aggregation operations
Working with large data sets