Amazon SageMaker: Train models with tens or hundreds of billions of parameters

State-of-the-art models are rapidly increasing in size and complexity. These models can be difficult to train because of cost, time, and skill sets required to optimize memory and compute. In this session, learn how Amazon SageMaker enables customers to train large models by using clusters of accelerated compute instances and software libraries to partition models and optimize communication between instances. Learn concepts and techniques such as pipeline parallelism, tensor parallelism, optimizer state sharding, activation checkpointing, and others. Discuss best practices and tips and pitfalls in configuring training for these state-of-the-art large models.

Previous Video
Prepare data for ML with ease, speed, and accuracy
Prepare data for ML with ease, speed, and accuracy

Join this session to learn how to prepare data for ML in minutes using Amazon SageMaker. SageMaker offers t...

Next Video
Kick-start your ML journey by going hands-on with AWS DeepRacer
Kick-start your ML journey by going hands-on with AWS DeepRacer

This session provides developers of all skill levels an opportunity to get hands-on experience with ML, thr...