Best practices for pre-training large language models on AWS with Amazon SageMaker
In this presentation, you will learn how Amazon SageMaker training reduces the time and cost to train and tune large language models without the need to manage infrastructure.
Training large language models (LLMs) with hundreds of millions to trillion parameters helps organisations achieve state-of-the-art performance but comes with infrastructure challenges. LLMs require multi-GPU training and entail significant cost. In this presentation, you will learn how Amazon SageMaker training reduces the time and cost to train and tune large language models without the need to manage infrastructure. We will discuss best practices for data loading, scaling to thousands of GPUs with the SageMaker model parallel library and the AWS managed infrastructure and implementing monitoring and resiliency mechanisms.