Skip to main content

Deploy ML models for inference at high performance and low cost

High-performance, cost-effective model deployment is critical to maximize the return on your machine learning (ML) investments. Amazon SageMaker provides the breadth and depth of fully managed deployment features for you to achieve optimal inference performance and cost while reducing the operational burden of deploying and managing models in production. In this session, learn how to use SageMaker inference capabilities to quickly deploy ML models in production at scale. Discover SageMaker deployment options, including infrastructure choices; real-time, serverless, asynchronous, and batch inference; single-model, multi-model, and multi-container endpoints; auto scaling; SageMaker Inference Recommender; model monitoring; and SageMaker MLOps integration. We also cover how to validate the performance of new ML models against production models to prevent costly outages.