Raw notes from Adam Gibson's Deep Learning in Production.

Defining Production: It's different for a startup from an enterprise, and different from academia. Each have their own set of tools and expectations.

GPU clusters. On prem research: something flexible and quick. Not much need for complexity management. Python HPC stack. C sometimes used for new neural networks.

Cloud research: AWS/Azure spin up resources as needed.

On Prem production: HPC, Video transcoding. They use a lot of GPUs. They also use resource schedulers: MESOS and YARN.

I must have missed when he talked about how GPUs are essential for this space.


Hadoop: HDFS and ZooKeeper. 16:39.

Two modes: training and inference (using a model as an API). GPUs are specialized for matrix computations.

It started to become difficult to follow due to lots of hopping around without defining the terms well enough, or at all. It must be my ignorance of the space.

At least he showed us how to pronounce Lagom. It's log-AHM.

Training models is difficult and expensive. This is why you need a specialized chip.


ETL: Extract Transform Load.

Neural nets are made up of Tensors. MNIST.

Problems to think about on a GPU cluster:

  • Memory management. Each GPU doesn't have that much RAM, so you have to shard your problem.

  • Throughput

  • Resource provisioning

  • GPU allocation for job.

CUDA