WORKSHOP || Multi-GPU AI_Train the Trainer. Day 2: Scaling PyTorch: Distributed Data Parallel & Model Parallelism

Speakers: Casper van Leeuwen (NCC Netherlands) & Gyula Ujlaki (NCC Hungary)

As datasets and models grow in complexity, mastering distributed training becomes vital. In this video, Casper van Leeuwen from NCC Netherlands breaks down the technical implementation of PyTorch Distributed Data Parallel (DDP) to synchronise training across multiple nodes. Complementing this, Gyula Ujlaki from NCC Hungary presents the strategies behind Model Parallelism, demonstrating how to train massive architectures that exceed the memory capacity of a single GPU device.

CASTIEL 2 & EUROCC2

WORKSHOP || Multi-GPU AI_Train the Trainer. Day 2: Scaling PyTorch: Distributed Data Parallel & Model Parallelism

Links

Stay Updated!