Build & deploy data & ML pipelines, hassle-free

The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Seth Miller-Zhang, Senior Software Engineer at ZipRecruiter

“We got over 66% reduction in orchestration code when we moved to Flyte — a huge win!”

— Seth Miller-Zhang, Senior Software Engineer at ZipRecruiter

Trusted by

Bridge the gap between scalability and ease of use

Write locally, execute remotely

Don’t let friction between development and production slow down the deployment of new data/ML workflows and cause an increase in production bugs. Flyte enables rapid experimentation with production-grade software. Debug in the cloud by iterating on the workflows locally to achieve tighter feedback loops.

Write locallyExecute remotely
Scale as fast as your imagination

Scale as fast as your imagination

As your data and ML workflows expand and demand more computing power, your workflow orchestration platform must keep up. If it’s not designed to scale, your platform will require constant monitoring and maintenance. Flyte was built with scalability in mind, ready to handle changing workloads and resource needs.

Arno, CTO at Blackshark.ai

“Flyte’s scalability, data lineage, and caching capabilities enable us to train hundreds of models on petabytes of geospatial data, giving us an edge in our business.”

— Arno, CTO at Blackshark.ai

Give the power back to data practitioners and scientists

Data scientists, data and ML practitioners, and analytics pipeline builders need to work independently. They shouldn’t have to rely on ML and platform engineers to turn models or training pipelines into production-ready pipelines. Flyte enables user teams to build workflows using the Python SDK, while they can still easily deploy their workflows to the Flyte backend.

Krishna Yeramsetty, Principal Data Scientist at Infinome 2

“With Flyte, we want to give the power back to biologists. We want to stand up something that they can play around with different parameters for their models because not every … parameter is fixed. We want to make sure we are giving them the power to run the analyses.”

— Krishna Yeramsetty, Principal Data Scientist at Infinome

Give the power back to data practitioners and scientists

Create extremely flexible data and ML workflows

Track the health of your data and ML workflows at every stage of execution. Analyze data passages to identify the source of errors with ease.

Reuse tasks and workflows present in any project and domain using the reference_task and reference_launch_plan decorators. Share your work across teams to test it out in separate environments.

Your orchestration platform should integrate smoothly with the tools and services your teams use. Flyte offers both platform- and SDK-level integrations, making it easy to incorporate into your data/ML workflows as a plug-and-play service.

Resource allocation shouldn’t require complex infrastructure changes or decisions at compile time. Flyte lets you fine-tune resources from within your code — at runtime or with real-time resource calculations — without having to tinker with the underlying infrastructure.

End-to-end data lineageCollaborate with reusable componentsIntegrate at the platform levelAllocate resources dynamically

For data, ML and analytics

Copied to clipboard!
import os

import flytekit
import pandas as pd
from flytekit import Resources, kwtypes, task, workflow
from flytekit.types.file import CSVFile, FlyteFile
from flytekitplugins.sqlalchemy import SQLAlchemyConfig, SQLAlchemyTask

DATABASE_URI = (
    "postgresql://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs"
)

extract_task = SQLAlchemyTask(
    "extract_rna",
    query_template="""select len as sequence_length, timestamp from rna where len >= {{ .inputs.min_length }} and len <= {{ .inputs.max_length }} limit {{ .inputs.limit }}""",
    inputs=kwtypes(min_length=int, max_length=int, limit=int),
    output_schema_type=pd.DataFrame,
    task_config=SQLAlchemyConfig(uri=DATABASE_URI),
)


@task(requests=Resources(mem="700Mi"))
def transform(df: pd.DataFrame) -> pd.DataFrame:
    """Add date and time columns; drop timestamp column."""
    timestamp = pd.to_datetime(df["timestamp"])
    df["date"] = timestamp.dt.date
    df["time"] = timestamp.dt.time
    df.drop("timestamp", axis=1, inplace=True)
    return df


@task(requests=Resources(mem="700Mi"))
def load(df: pd.DataFrame) -> CSVFile:
    """Load the dataframe to a csv file."""
    csv_file = os.path.join(flytekit.current_context().working_directory, "rna_df.csv")
    df.to_csv(csv_file)
    return FlyteFile(path=csv_file)


@workflow
def etl_workflow(
    min_length: int = 50, max_length: int = 200, limit: int = 10
) -> CSVFile:
    """Build an extract, transform and load pipeline."""
    return load(
        df=transform(
            df=extract_task(min_length=min_length, max_length=max_length, limit=limit)
        )
    )
Data ETL
Copied to clipboard!
import pandas as pd
from flytekit import Resources, task, workflow
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression


@task(requests=Resources(mem="700Mi"))
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame


@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    """Simplify the task from a 3-class to a binary classification problem."""
    return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))


@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
    """Train a model on the wine dataset."""
    features = data.drop("target", axis="columns")
    target = data["target"]
    return LogisticRegression(**hyperparameters).fit(features, target)


@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    """Put all of the steps together into a single workflow."""
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )
Machine Learning
Copied to clipboard!
import pandas as pd
import plotly
import plotly.graph_objects as go
import pycountry
from flytekit import Deck, task, workflow, Resources


@task(requests=Resources(mem="1Gi"))
def clean_data() -> pd.DataFrame:
    """Clean the dataset."""
    df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
    filled_df = (
        df.sort_values(["people_vaccinated"], ascending=False)
        .groupby("location")
        .first()
        .reset_index()
    )[["location", "people_vaccinated", "date"]]
    filled_df = filled_df.dropna()
    countries = [country.name for country in list(pycountry.countries)]
    country_df = filled_df[filled_df["location"].isin(countries)]
    return country_df


@task(disable_deck=False)
def plot(df: pd.DataFrame):
    """Render a Choropleth map."""
    df["text"] = df["location"] + "<br>" + "Last updated on: " + df["date"]
    fig = go.Figure(
        data=go.Choropleth(
            locations=df["location"],
            z=df["people_vaccinated"].astype(float),
            text=df["text"],
            locationmode="country names",
            colorscale="Blues",
            autocolorscale=False,
            reversescale=True,
            colorbar_title="Population",
            marker_line_color="darkgray",
            marker_line_width=0.5,
        )
    )

    fig.update_layout(
        title_text="Share of people who recieved at least one dose of COVID-19 vaccine",
        geo_scope="world",
        geo=dict(
            showframe=False, showcoastlines=False, projection_type="equirectangular"
        ),
    )
    Deck("Bar Plot", plotly.io.to_html(fig))


@workflow
def analytics_workflow():
    """Prepare a data analytics workflow."""
    plot(df=clean_data())
Analytics
Analytics Workflow

“Gojek is experiencing rapid growth and incorporating machine learning into various products. To sustain this growth and guarantee success, a reliable and scalable pipeline solution is critical. Flyte plays a vital role as a key component of Gojek’s ML Platform by providing exactly that.”

— Pradithya Aria Pura, Principal Software Engineer at Gojek

Minimal maintenance overhead

Set up once and revisit only if you need to make Flyte more extensible.

Robust and scalable like never before

Deploy your data and ML workflows with confidence. Focus on what matters most — the business logic of your workflows.

Vibrant community

Receive timely responses to your questions on Slack, with an average response time of 6–8 hours or less.

From data processing to distributed model training, Flyte streamlines the entire data & ML workflow development process

Begin your Flyte journey today

“It’s not an understatement to say that Flyte is really a workhorse at Freenome!”

— Jeev Balakrishnan, Software Engineer at Freenome

Union.ai

Union lets you leverage Flyte without worrying about infrastructure constraints and setup.