Build & deploy data & ML pipelines, hassle-free

The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Star Us

Try Hosted Flyte ↗

“We got over 66% reduction in orchestration code when we moved to Flyte — a huge win!”

— Seth Miller-Zhang, Senior Software Engineer at ZipRecruiter

Trusted by

See Flyte in Action

How LLMs Are Transforming Computer Vision ↗

Sage Elliott

•

Jan 5, 2024

Flyte for GCP: A Platform Engineer’s Overview ↗

David Espejo

•

Jan 3, 2024

Building Large-Scale Xarray Datasets for Geospatial Computing with Union.ai and Flyte

David Espejo

•

May 15, 2025

Write locally, execute remotely

Don’t let friction between development and production slow down the deployment of new data/ML workflows and cause an increase in production bugs. Flyte enables rapid experimentation with production-grade software. Debug in the cloud by iterating on the workflows locally to achieve tighter feedback loops.

Scale as fast as your imagination

As your data and ML workflows expand and demand more computing power, your workflow orchestration platform must keep up. If it’s not designed to scale, your platform will require constant monitoring and maintenance. Flyte was built with scalability in mind, ready to handle changing workloads and resource needs.

“Flyte’s scalability, data lineage, and caching capabilities enable us to train hundreds of models on petabytes of geospatial data, giving us an edge in our business.”

— Arno, CTO at Blackshark.ai

Give the power back to data practitioners and scientists

Data scientists, data and ML practitioners, and analytics pipeline builders need to work independently. They shouldn’t have to rely on ML and platform engineers to turn models or training pipelines into production-ready pipelines. Flyte enables user teams to build workflows using the Python SDK, while they can still easily deploy their workflows to the Flyte backend.

“With Flyte, we want to give the power back to biologists. We want to stand up something that they can play around with different parameters for their models because not every … parameter is fixed. We want to make sure we are giving them the power to run the analyses.”

— Krishna Yeramsetty, Principal Data Scientist at Infinome

Give the power back to data practitioners and scientists

Create extremely flexible data and ML workflows

End-to-end data lineage

Track the health of your data and ML workflows at every stage of execution. Analyze data passages to identify the source of errors with ease.

Collaborate with reusable components

Reuse tasks and workflows present in any project and domain using the reference_task and reference_launch_plan decorators. Share your work across teams to test it out in separate environments.

Integrate at the platform level

Your orchestration platform should integrate smoothly with the tools and services your teams use. Flyte offers both platform- and SDK-level integrations, making it easy to incorporate into your data/ML workflows as a plug-and-play service.

Allocate resources dynamically

Resource allocation shouldn’t require complex infrastructure changes or decisions at compile time. Flyte lets you fine-tune resources from within your code — at runtime or with real-time resource calculations — without having to tinker with the underlying infrastructure.

Get started ↗

For data, ML and analytics

Copied to clipboard!

import os

import flytekit
import pandas as pd
from flytekit import Resources, kwtypes, task, workflow
from flytekit.types.file import CSVFile, FlyteFile
from flytekitplugins.sqlalchemy import SQLAlchemyConfig, SQLAlchemyTask

DATABASE_URI = (
    "postgresql://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs"
)

extract_task = SQLAlchemyTask(
    "extract_rna",
    query_template="""select len as sequence_length, timestamp from rna where len >= {{ .inputs.min_length }} and len <= {{ .inputs.max_length }} limit {{ .inputs.limit }}""",
    inputs=kwtypes(min_length=int, max_length=int, limit=int),
    output_schema_type=pd.DataFrame,
    task_config=SQLAlchemyConfig(uri=DATABASE_URI),
)


@task(requests=Resources(mem="700Mi"))
def transform(df: pd.DataFrame) -> pd.DataFrame:
    """Add date and time columns; drop timestamp column."""
    timestamp = pd.to_datetime(df["timestamp"])
    df["date"] = timestamp.dt.date
    df["time"] = timestamp.dt.time
    df.drop("timestamp", axis=1, inplace=True)
    return df


@task(requests=Resources(mem="700Mi"))
def load(df: pd.DataFrame) -> CSVFile:
    """Load the dataframe to a csv file."""
    csv_file = os.path.join(flytekit.current_context().working_directory, "rna_df.csv")
    df.to_csv(csv_file)
    return FlyteFile(path=csv_file)


@workflow
def etl_workflow(
    min_length: int = 50, max_length: int = 200, limit: int = 10
) -> CSVFile:
    """Build an extract, transform and load pipeline."""
    return load(
        df=transform(
            df=extract_task(min_length=min_length, max_length=max_length, limit=limit)
        )
    )

Copied to clipboard!

import pandas as pd
from flytekit import Resources, task, workflow
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression


@task(requests=Resources(mem="700Mi"))
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame


@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    """Simplify the task from a 3-class to a binary classification problem."""
    return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))


@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
    """Train a model on the wine dataset."""
    features = data.drop("target", axis="columns")
    target = data["target"]
    return LogisticRegression(**hyperparameters).fit(features, target)


@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    """Put all of the steps together into a single workflow."""
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )

Copied to clipboard!

import pandas as pd
import plotly
import plotly.graph_objects as go
import pycountry
from flytekit import Deck, task, workflow, Resources


@task(requests=Resources(mem="1Gi"))
def clean_data() -> pd.DataFrame:
    """Clean the dataset."""
    df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
    filled_df = (
        df.sort_values(["people_vaccinated"], ascending=False)
        .groupby("location")
        .first()
        .reset_index()
    )[["location", "people_vaccinated", "date"]]
    filled_df = filled_df.dropna()
    countries = [country.name for country in list(pycountry.countries)]
    country_df = filled_df[filled_df["location"].isin(countries)]
    return country_df


@task(disable_deck=False)
def plot(df: pd.DataFrame):
    """Render a Choropleth map."""
    df["text"] = df["location"] + "<br>" + "Last updated on: " + df["date"]
    fig = go.Figure(
        data=go.Choropleth(
            locations=df["location"],
            z=df["people_vaccinated"].astype(float),
            text=df["text"],
            locationmode="country names",
            colorscale="Blues",
            autocolorscale=False,
            reversescale=True,
            colorbar_title="Population",
            marker_line_color="darkgray",
            marker_line_width=0.5,
        )
    )

    fig.update_layout(
        title_text="Share of people who recieved at least one dose of COVID-19 vaccine",
        geo_scope="world",
        geo=dict(
            showframe=False, showcoastlines=False, projection_type="equirectangular"
        ),
    )
    Deck("Bar Plot", plotly.io.to_html(fig))


@workflow
def analytics_workflow():
    """Prepare a data analytics workflow."""
    plot(df=clean_data())

“Gojek is experiencing rapid growth and incorporating machine learning into various products. To sustain this growth and guarantee success, a reliable and scalable pipeline solution is critical. Flyte plays a vital role as a key component of Gojek’s ML Platform by providing exactly that.”

— Pradithya Aria Pura, Principal Software Engineer at Gojek

One platform for your workflow orchestration needs

Manage the lifecycle of your workflows on a centralized platform with ease and at scale without fragmentation of tooling across your data, ML & analytics stacks.