J4K-2019: Arun Gupta: Machine Learning on Kubernetes
Raw Notes from Arun Gupta’s session.
Machine Learning 101
ML Frameworks and Infrastructures. For the ML expert practitioners.
- This is where Kubernetes fits
ML Services. Commoditized, managed services. You don’t have time to train for your own models.
AI Services. Cognitive services: Vision, Speech, Lanugaue, Chatbots, Forecasting, Recommendation.
Why ML on Kubernetes
Mentioned that ML is Stateful
Managed Kubernetes control plane, attach data plane
Managed data plane coming this year
Native upstream Kubernetes experience. No forking, patching.
Integration with additional AWS services.
exsctlInstallable with brew.
brew tap weaveworks/tap brew install weaveworks/tap/exsctl eksctl create cluster --node-type=p2.xlarge (GPU powered cluster)
Does not install kubectl. That has to be there already.
Set up Kubernetes for ML Option 1
Train: Set up control plane, EKS cluster.
- Set up as autoscaling group
Inference: Set up another control plane, EKS cluster
This is the dedicated K8s
Set up Kubernetes for ML Option 2
- Use two separate node groups in one EKS cluster
This is the unified K8s
Scaling the cluster
Cluster autoscaler: burstable workloads. Scale up based on metrics.
Escalator: Batch or job-based workloads. More suited to ML. ML jobs run for a long time. You don’t want Kubernetes messing with your cluster while a job is running. Agressively scale up to reduce wait-time for pods.
They both take over the auto-scaling knob.
Challenges in setting up containers for ML
Takes days to configure and test.
Must optimized for performance and scale.
Re-build and re-optimize.
AWS Deep Learning Containers
Optimized and customizable containers for known domains.
Use these as your base images.
Touts twice as fast TensorFlow training with AWS-Optimized Tensorflow.
ML on K8s