PyTorch has released tutorials on Full Sharded Data Parallel

The main goal of the tutorial is to help build experience in using FSDP for distributed AI learning and looks forward to adding more videos to the series.PyTorch announced a new series of 10 video tutorials on Full Sharded Data Parallel (FSDP) today. Guided by AI/PyTorch partner engineer Less Wright, who also spoke at Nvidia Fall GTC 2022.

Introducing what users will learn, Less Wright says, “Whether you’re training a 100 million or 1 trillion model, the series will allow users to train models more effectively, along with short deep dives into various aspects of FSDP.”
Wright believes the main goal of the 10-part series is to help build experience in using FSDP for distributed AI training. He also says the series will be added with new videos, along with features in FSDP.
For example, the first series, titled, ‘Accelerate Learning Speed with the FSDP Transformer Wrap,’ consists of a tutorial on how to use the new FSDP Transformer Wrap. Unlike the standard wrapper, which makes choices based on a number of parameters, this transformer wrapper understands how the model works by finding appropriate shard damage.

Simply put, it allows users to know how to implement a transformer wrap and increase the model’s learning curve by up to 2x.

Other parts of the series include FSDP Mixed Precision Training, Sharding Strategies, Backwards Prefetching and Fine Tuning Models.
Meta recently announced the PyTorch project as part of the nonprofit Linux Foundation – recently established as the PyTorch Foundation. The main goal will be to encourage adoption of AI and deep learning – encouraging and maintaining an ecosystem of open source and vendor-neutral projects through PyTorch.