Scale your Large Distributed Training Jobs with Data and Model Parallelism Optimized for Amazon SageMaker [Level 300]

June 9, 2021
AWS offers a breadth and depth of machine learning (ML) infrastructure for training and inference workloads that you can use through either a do-it-yourself approach or a fully managed approach with Amazon SageMaker. In this session, explore how to choose the proper instance for ML training and inference based on model size, complexity, throughput, framework choice, inference latency and portability requirements. Join this session to compare and contrast compute-optimized CPU-only instances, such as Amazon EC2 C4 and C5; high-performance GPU instances, such as Amazon EC2 G4, P3, and P4d; cost-effective variable-size GPU acceleration with Amazon Elastic Inference; and high performance/cost with Amazon EC2 Inf1 instances powered by custom-designed AWS Inferentia chips.

Speaker: Shashank Prasanna, AWS Senior Advocate, AI/ML
Previous Video
Standardize and Automate your Feature Engineering Workflows with SageMaker Feature Store [Level 300]
Standardize and Automate your Feature Engineering Workflows with SageMaker Feature Store [Level 300]

Learn how to solve all these problems with Amazon SageMaker Feature Store, and how to use it with both the ...

Next Video
Detect Potential Bias in your Datasets and Explain how your Models Predict using SageMaker Clarify (Level 300)
Detect Potential Bias in your Datasets and Explain how your Models Predict using SageMaker Clarify (Level 300)