This hands-on tutorial introduces the Heat library, which is designed to scale Python-based array computing and data science workflows to distributed and GPU-accelerated environments. Heat offers a familiar NumPy-like API while distributing memory-intensive operations using PyTorch and mpi4py.
Topics covered include:
- Heat Fundamentals: Get started with distributed arrays (DNDarrays), distributed I/O, data decomposition schemes, and array operations.
- Key Functionalities: Explore the multi-node linear algebra, statistics, signal processing, and machine learning capabilities.
- DIY Development: Learn how to use Heat's infrastructure to build your own multi-node, multi-GPU capable research applications.
Prerequisites:
Participants should have a laptop and experience with Python and its scientific ecosystem (e.g., NumPy, SciPy). A basic understanding of MPI is helpful but not required.
Target audience:
Researchers and Research Software Engineers (RSEs) working with large datasets that exceed the memory of a single machine. HPC practitioners who support these scientists or may be interested in contributing to the project are also welcome.
Language:
The course will be held in English.