Amazon SageMaker: Train models with tens or hundreds of billions of parameters

State-of-the-art models are rapidly increasing in size and complexity. These models can be difficult to train because of cost, time, and skill sets required to optimize memory and compute. In this session, learn how Amazon SageMaker enables customers to train large models by using clusters of accelerated compute instances and software libraries to partition models and optimize communication between instances. Learn concepts and techniques such as pipeline parallelism, tensor parallelism, optimizer state sharding, activation checkpointing, and others. Discuss best practices and tips and pitfalls in configuring training for these state-of-the-art large models.

Amazon SageMaker: Train models with tens or hundreds of billions of parameters

Opening Keynote: Accelerate innovation with ML

Rethink possible: AI/ML innovation stories

Remove unnecessary onboarding friction with real-time fraud detection

Elevate customer experiences with AWS Contact Center Intelligence

Build unique user experiences through personalization

Extract data and insights from your documents

Adding identity verification to your application

Create real-time personalized user experiences faster at scale

Find accurate information faster with intelligent search

Generate ML predictions without writing any code

Prepare data for ML with ease, speed, and accuracy

Use Amazon SageMaker to build high-quality ML models faster

Achieve high-performance and cost-effective model deployment

Implementing MLOps practices with Amazon SageMaker

Streamline content moderation workflows with AI/ML

Three ways ML can transform your developer operations

Choosing the right ML instances for your training and inference deployments

Build custom deep learning environments with AWS Deep Learning Containers

ML at the edge with Amazon SageMaker

DJL: An open-source library to build and deploy deep learning in Java