Lyft’s Cloud Native Machine Learning and Data Processing Platform, Now Open Sourced

Flyte makes it easy to create concurrent, scalable, and maintainable workflows for machine learning and data processing.
Our scale is proven at Lyft, where Flyte has been serving production model training and data processing for over three years, becoming the de-facto platform for teams like Pricing, Locations, ETA, Mapping, Autonomous, and more. In fact, Flyte manages over 7,000 unique workflows at Lyft, totaling over 100,000 executions every month, 1 million tasks, and 10 million containers.

Hosted, multi-tenant, and serverless

Flyte frees you from wrangling infrastructure, allowing you to concentrate on business problems rather than machines. As a multi-tenant service, you work in your own, isolated repo and deploy and scale without affecting the rest of the platform. Your code is versioned, containerized with its dependencies, and every execution is reproducible.

To provide this level of isolation, we’re built directly on Kubernetes and get all the benefits containerization provides: portability, scalability, reliability, and more.

Elastic Scale

Flyte is purpose built to scale. With a fully distributed, fault-tolerant control plane, we have no single point of failure and can scale to multiple clusters, thousands of nodes, and thousands of concurrent workflows.

Our scale is proven at Lyft, where Flyte has been serving production model training and data processing for over three years, becoming the de-facto platform for teams like Pricing, Locations, ETA, Mapping, Autonomous, and more. In fact, Flyte manages over 7,000 unique workflows at Lyft, totaling over 100,000 executions every month, 1 million tasks, and 10 million containers.

Parameters, Data Lineage, and Caching

All Flyte tasks and workflows have strongly typed inputs and outputs. This makes it possible to parameterize your workflows, have rich data lineage, and use cached versions of pre-computed artifacts. If, for example, you’re doing hyperparameter optimization, you can easily invoke different parameters with each run. Additionally, if the run invokes a task that has already been computed before, regardless of who executed it, Flyte will smartly use the cached output, saving you both time and money.

Versioned, Reproducible, and Shareable

Every entity in Flyte is immutable, with every change explicitly captured as a new version. This makes it easy and efficient for you to iterate, experiment and rollback your workflows. Furthermore, Flyte enables you to share these versioned tasks across workflows, speeding up your dev cycle by avoiding repetitive work across individuals and teams.

Dynamic and extensible

Flyte is framework-agnostic and has a growing collection of plugins to assist with all of your workflow needs, including Spark on K8s, AWS Batch, Array Jobs, Hive Qubole, Containers, Pods, and more. It’s easy to contribute a plugin, too! Get started here.

It can also be advantageous to author workflow tasks in a variety of languages, so our SDK can be extended beyond Python to allow true polyglot programming.