Flyte 1.10: Monorepo, New Agents, Eager Workflows and More
We're delighted to present our latest release! September and October kept us busy as we developed several exciting features and fixed numerous bugs, improving the overall developer experience. Notable highlights of the Flyte 1.10 release include the monorepo and new Flyte agents. Let's delve into the details.
Monorepo
The backend development of Flyte has been transitioned to a monorepo. This transition includes the migration of repositories such as datacatalog, flyteadmin, flytecopilot, flyteplugins, flytepropeller and flytestdlib. Each of these components now resides as a top-level directory in the flyte repo. We believe that this change will significantly enhance the contribution experience, making it easier to test and merge changes into the backend code. In an upcoming blog post, we will be providing a more detailed explanation of why we opted for this monorepo structure, how we executed the migration, and what the development experience will look like.
New agents
Flyte 1.10 agents are not only more performant than ever — we also support more of them, including Airflow, Memverge, Snowflake, Databricks and sensors!
Airflow
The Airflow agent enables the smooth execution of Airflow tasks in the Flyte workflow without requiring any code changes. All the Airflow tasks will be executed on an Airflow agent (long-running server) rather than launching a new pod for each task, significantly reducing overhead.
This integration allows you to:
- Compile Airflow tasks into Flyte tasks
- Incorporate Airflow sensors/operators into Flyte workflows
- Support the local execution of Airflow tasks without requiring a cluster setup
To install the plugin, run the following command:
Here's an example of an Airflow file sensor:
And here’s how you can define an Airflow time sensor:
You can find more examples of the Airflow agent in this PR.
MemVerge
The MemVerge plugin facilitates the execution of Flyte tasks using the MemVerge Memory Machine Cloud. It supports resource requests and limits (CPU and memory), container images and specifications for environment variables. ImageSpec can be used to define the images for running tasks.
The following secrets need to be defined for the agent server:
- `mmc_address`: MMCloud OpCenter address
- `mmc_username`: MMCloud OpCenter username
- `mmc_password`: MMCloud OpCenter password
To install the plugin, use the following command:
Here is an example showcasing the functionality of the MemVerge agent:
Snowflake
The Snowflake agent enables the execution of a query on a Snowflake database, both locally and remotely.
To install the plugin, use the following command:
Here’s an example of the Snowflake agent:
Databricks
The Databricks agent can be used to submit Spark jobs to the Databricks platform.
To install the plugin, run the following command:
Here is an example showcasing the functionality of the Databricks agent:
Base and file sensors
This feature was introduced in the v1.9.1 patch release.
Sensors are valuable for waiting for specific events to occur. You can inherit the `BaseSensor` class to create a custom sensor in Flyte. Here's an example of a file sensor:
Pyflyte ergonomic improvements
- The command `pyflyte run remote-launchplan` allows the execution of launch plans from the server on the CLI.
- The `pyflyte run` command now supports all launch plan parameters, including labels, annotations, service accounts and tags. Labels and annotations serve as metadata attachments for objects in Kubernetes, while service accounts provide identities for Kubernetes pods. Tags correspond to the tags that can be set for an execution.
- Use pyflyte to activate and deactivate launch plans.
- Use pyflyte to interact with gate nodes in local executions. This allows you to debug workflows that use gate nodes so that you can test them out without running them on a Flyte cluster
- Beautified output of the `pyflyte run` command (introduced in the v1.9.1 patch release)
Programmatically consume inputs and outputs
The Flyte UI now displays flyte-remote code snippets that illustrate how to access the inputs and outputs of an execution. You can conveniently copy and paste these snippets to retrieve data from your execution.
Eager workflows
This feature was introduced in the v1.9.1 patch release.
Unlike static and dynamic workflows, eager workflows enable the use of familiar Python constructs through the `asyncio` API. To illustrate this, here's a simple eager workflow using the `@eager` decorator.
When you decorate a function with `@eager`, any function invoked within it that’s decorated with `@task`, `@workflow`, or `@eager` becomes an awaitable object within the lifetime of the parent eager workflow execution. Note that this happens automatically and you don’t need to use the `async` keyword when defining a task or workflow that you want to invoke within an eager workflow.
What can you do with eager workflows?
- Operate on task and sub-workflow outputs
- Define Python conditionals
- Define loops
- Invoke static workflows
- Nest eager subworkflows
- Catch exceptions
Local entrypoint and support for offloaded types
New in v1.10.0 release.
We have added a new feature that enables the execution of an eager workflow locally with the `local_entrypoint` argument, with the tasks or sub-workflows being run remotely. Moreover, all offloaded types such as FlyteFile, FlyteDirectory and StructuredDataset will materialize as Python values, fully downloaded into the pod.
When you specify `local_entrypoint=True`, the eager workflow literally becomes a local entrypoint to the configured `FlyteRemote` cluster. This feature is designed for you to iterate much more quickly in a local environment so that you can leverage the power of your Flyte cluster when needed and materialize any data locally so that you can debug, develop and experiment more easily.
You can access the documentation on eager workflows here.
FlyteDirectory batch upload
This feature was introduced in the v1.9.1 patch release.
Optimize memory consumption by defining the batch size during FlyteDirectory upload or download. This designated batch size will be utilized to process the directory in manageable chunks during the upload or download process.
Pydantic type transformer
Pydantic is a popular Python data validation library that provides a flexible approach for users to define custom types, mapping field names to a range of value types, much like data classes. With this integration, Pydantic base models can now be utilized as inputs and outputs in Flyte tasks.
To install the plugin, use the following command:
The following is an example of the Pydantic integration:
Mashumaro to serialize/deserialize dataclasses
In contrast to dataclass JSON, Mashumaro expedites the serialization and deserialization of dataclass — up to 1.5 times faster for more substantial tasks. For a comprehensive analysis of the performance benchmarks, please refer to the detailed breakdown provided here.
Other enhancements
- In the flytekit `conditional()` block, you can check for `None` using `is_none()`. Both `is_true()` and `is_false()` are already supported. For example:
- Azure workload identity can now be used for `fsspec` in flytekit, allowing flytekit to make use of AKS-enabled workload identities. Before this change, reading and writing data to a storage account was only possible by setting environment variables for the storage account key. Set `anon` to `False` to use the default credentials provided by AKS workload
- The new version adds support for `admin.clientSecretEnvVar` in the flytectl config.yaml file for use by flytekit
- We’ve provided an alternative deployment guide for Flyte on Google Cloud Platform (GCP) leveraging the Google Compute Engine (GCE) ingress controller, GCP managed TLS certificates and GCP Identity Aware Proxy with the goal of implementing a zero-trust access model
- The `pyflyte backfill` command now supports the use of `WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE` in the form of the `--no-fail-fast` option. When set to True, the backfill will fail immediately if any of the backfill steps fail. When set to False, the backfill will continue to run even if some of the backfill steps fail
- The MLFlow plugin now works with Python 3.11
- Version 1.10 adds support for Azure Blob Storage for storing metadata and raw data, including structured datasets, without interrupting other standard Azure authentication. Ensure that `storage_options` are set consistently for all uses of `fsspec`
- The `enable_deck` option is now available in the `@task` decorator, enabling the viewing of decks in Flyte tasks
- We’ve added an image type transformer to support `PIL.Image`; it’s now a valid type in flytekit
- `FlyteRemote.execute()` now allows execution name prefixes, enabling the launch of multiple executions with the same execution name prefix. Under the hood, a UUID is appended to the execution name (introduced in the v1.9.1 patch release)
You can find the detailed release notes here.
Docs improvements
- Added examples of eager workflows to the documentation.
- Restructured the documentation for a more streamlined developer experience.
- Revamped the basics section within the documentation.
- Documented the integration of the MMCloud agent.
1.10 contributors
We extend our heartfelt gratitude to all the contributors who have made invaluable contributions to Flyte 1.10. Thank you for your dedication and support!
{{contributors-1-10="/blog-component-assets"}}
We highly value the feedback of our users and community members, which helps us to improve our product continuously. To connect with other users and get support from our team, we encourage you to join our Slack channel. For updates on product development, community events, and announcements, follow us on Twitter to join the conversation and share your thoughts.
In case you encounter any issues or bugs, we request you let us know by creating a GitHub issue. If you find Flyte useful, don't forget to ⭐ on GitHub.