Samhita Alla
Samhita Alla

Get Your Pipeline Orchestration Groove on with Our Sandbox

TL;DR: Try our free hosted sandbox to orchestrate and run data and machine learning pipelines. This is beginner-friendly and doesn't require any background in workflow orchestration!

Data and ML orchestration can be a complex and mysterious process for many. We understand that the benefits of orchestration can be difficult to grasp without first-hand experience. That's why we developed a hosted sandbox environment (available courtesy of Union.ai) as a first step to discovering the power of orchestration. Our goal is to help you explore the benefits of orchestration and scalability of pipelines, and to empower you to take the next step in your workflow orchestration journey.

While the hosted sandbox environment showcases the power of running Flyte workflows, it's important to note that the sandbox will help you gain a broad understanding of workflow orchestration concepts beyond just Flyte. Keep in mind that the environment is temporary and expires after four hours, and any executions or outputs generated are not persistent. As a trial environment, the hosted sandbox makes it easy to experiment and explore orchestration with Flyte, providing a glimpse of the powerful capabilities that workflow orchestration can offer.

To help you better understand the capabilities of Flyte and the benefits of workflow orchestration, we've included a comprehensive step-by-step tutorial within the hosted sandbox environment. This tutorial explains Flyte's key features and functionalities and demonstrates how Flyte can help streamline your data and ML pipelines.

Please note that this article won't go into the nitty-gritty details of the tutorial, but rather aims to describe the firsthand benefits of workflow orchestration that users can experience by trying out the hosted sandbox environment.

What entails orchestration

The first step in the tutorial is to migrate an existing ML pipeline into a Flyte workflow. This process can include:

  • Annotating inputs and outputs with appropriate types
  • Decorating Python functions with tasks
  • Creating a workflow to establish dependencies between tasks
  • Utilizing Flyte constructs such as map tasks, dynamic workflows and conditionals to streamline execution and capture workflow dynamism

Although adding types may seem challenging at first, it is essential for ensuring data validation. Without proper type annotation, debugging can become difficult, and errors may not be caught until later in the process.

Parallelize pipelines

To optimize resource consumption and enhance performance, Flyte tasks are designed to be inherently parallel. While a typical pipeline would run code sequentially, Flyte enables parallelism by default, without requiring any special steps to enable. Moreover, Flyte's map tasks can further accelerate processing by breaking down larger computations into independent pieces that can be executed in parallel.

Copied to clipboard!
@workflow
def training_workflow(hp_grid: List[dict]) -> List[FlytePickle]:
    """Put all of the steps together into a single workflow."""
    data = get_data()
    processed_data = process_data(data=data)
    return map_task(train_model, concurrency=5)(
        args=prepare_train_args(hp_grid=hp_grid, data=processed_data)
    )

Interact programmatically

Users can interact with Flyte programmatically from within Jupyter notebooks, allowing for the programmatic registration, execution, retrieval and inspection of workflow outputs. This means you can leverage the advantages of Flyte's orchestration and scalability capabilities without leaving the convenience of your notebook environment. Using programmatic interactions with Flyte is particularly beneficial if you prefer an API-driven workflow.

Visualize executions

The hosted sandbox provides a user interface for seamless interaction with workflow execution. Via this interface, users can visualize and track their executions, and explore how teams and organizations can run multiple workflows simultaneously while they maintain separation of concerns.

Version executions

Versioning is an integral feature of Flyte. When iterating on workflows, there is no need to specify a version as it is automatically assigned. It’s easy to forget that executions are versioned, since Flyte handles everything internally! 

The hosted sandbox provides an opportunity to experience the benefits of version control even before delving into orchestration. This is particularly advantageous as versioning ensures reproducibility and lets users roll back to previous executions when necessary.

While this article only scratches the surface of the benefits that Flyte workflow orchestration can provide, we hope that it has sparked your interest and encouraged you to try it out for yourself. You can discover and explore all the features that Flyte offers on our website.

<a href="https://sandbox.union.ai/" class="button w-button" target="_blank">Try Hosted Sandbox ↗</a>

If you're ready to take the next step and scale your Flyte workflows for production use, you can self-host Flyte on cloud or on-premises environments. If you want to leverage Flyte without worrying about infrastructure constraints and setup, you can find more information about it on the Union Cloud page.

Our comprehensive documentation is designed to guide you through every step of your Flyte journey, from getting started to advanced usage. And if you have any questions or want to connect with our team, our active Slack community is always there to help. We invite you to explore all the features that Flyte has to offer and experience the benefits of workflow orchestration. Don't forget to show your support by giving the Flyte repository a star!