Speaker(s):

How Beam ML Optimizes Serving Large Models

Sep-4 14:30-14:55 in Mariposa Grove
Add to Calendar 09/04/2024 2:30 PM 09/04/2024 2:55 PM America/Los_Angeles AS24: How Beam ML Optimizes Serving Large Models

Serving ML models at scale is increasingly important, and Beam’s RunInference transform is a great tool to do this. At the same time, models are getting larger and larger, and it can be hard to fit them into your CPU or GPU.

This talk will explore some of the mechanisms that Beam has put in place for large model management so that it can serve your models efficiently without requiring any additional work from the pipeline author. Attendees can expect to come away with an understanding of how Beam loads and serves models, how it optimizes its serving architecture for different model sizes/footprints, and how they can use Beam to serve their models (large or small).

Mariposa Grove

Serving ML models at scale is increasingly important, and Beam’s RunInference transform is a great tool to do this. At the same time, models are getting larger and larger, and it can be hard to fit them into your CPU or GPU.

This talk will explore some of the mechanisms that Beam has put in place for large model management so that it can serve your models efficiently without requiring any additional work from the pipeline author. Attendees can expect to come away with an understanding of how Beam loads and serves models, how it optimizes its serving architecture for different model sizes/footprints, and how they can use Beam to serve their models (large or small).