Skip to main content

Best practices for pre-training large language models on AWS with Amazon SageMaker

In this presentation, you will learn how Amazon SageMaker training reduces the time and cost to train and tune large language models without the need to manage infrastructure.

Download PDF

Training large language models (LLMs) with hundreds of millions to trillion parameters helps organisations achieve state-of-the-art performance but comes with infrastructure challenges. LLMs require multi-GPU training and entail significant cost. In this presentation, you will learn how Amazon SageMaker training reduces the time and cost to train and tune large language models without the need to manage infrastructure. We will discuss best practices for data loading, scaling to thousands of GPUs with the SageMaker model parallel library and the AWS managed infrastructure and implementing monitoring and resiliency mechanisms.