Achieve high-performance and cost-effective model deployment
High-performance and cost-effective techniques, including real-time, asynchronous, and batch, are needed to scale model deployments to maximize your ML investments. In this session, learn the different inference options available in Amazon SageMaker, such as multi-container endpoints, inference pipelines, and multi-model endpoints as well as frameworks such as TensorFlow and PyTorch, Python-based backend servers, and C++/Go-based backend servers. Learn how to pick the best inference option for your ML use case so you can scale to thousands of models across your business.