MLA 014 Machine Learning Server
Jan 17, 2021
Click to Play Episode

Server-side ML. Training & hosting for inference, with a goal towards serverless. AWS SageMaker, Batch, Lambda, EFS, Cortex.dev


Resources
Resources best viewed here
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd Edition)
Designing Machine Learning Systems
Machine Learning Engineering for Production Specialization
Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines
Amazon SageMaker Technical Deep Dive Series


Show Notes

After you train an ML model and need to deploy it to production, you have a number of options. If your model runs rarely (1-50x / day), you can set it up as a batch job through various services. In this case it will run to completion, then take itself offline. If your model needs to always be available, via a customer-facing product with constant usage, then you'll deploy it as an endpoint through various services.

Batch models

  • AWS Batch. Lets you run a model deployed as a Docker container (eg via ECR) to completion, using price-saving features like spot instances. Much cheaper than Sagemaker, but at cost of spin-up time.

Endpoint models

  • AWS SageMaker lets you deploy trained models to a REST endpoint. Also lets you train models & view analytics and various training insights.
  • GCP Cloud ML. GCP's equivalent to SageMaker.
  • Cortex is similar to SageMaker, with many added benefits. It's free and open source, using your AWS stack to deploy services (like SageMaker) but allowing cost-savings via spot instances, better than SageMaker's 40% EC2 added cost. Soon they'll support scale-to-0 instances, for when your ML server doesn't have traffic; a huge cost saving.
  • Other competitors include PaperSpace Gradient, FloydHub, and more.