What’s new in Flyte 1.13.2 and flytekit 1.13.6?
Flyte 1.13.2 and flytekit 1.13.6 are feature-rich releases that bring a host of improvements to enhance efficiency, flexibility, and user experience. This blog post summarizes the key new features and bug fixes.
Flyte 1.13.2 highlights
Agents enabled by default on sandbox instances
With the goal of delivering a powerful development environment for contributors and users, Flyte sandbox (`flytectl demo start`) now ships with the Agent service enabled by default. That, combined with the Agent Watcher already part of Flyte, means that whenever you request a resource that comes from an Agent (for example, a BigQuery Job) there’s no need for extra setup. Propeller will obtain the metadata it needs to trigger the execution on the relevant Agent, including the Agent name, supported Task type and version.
- If you’re running Flyte sandbox in production and not using Agents, consider disabling the option by removing the relevant line from config file:
flytepropeller now accounts for etcd errors to trigger backoff
Flyte incorporates multiple mechanisms to handle situations of high concurrency and extremely high load. As some users have noted, usually the first bottleneck they hit in situations of high load is the Kubernetes API server, and flytepropeller includes logic to remain “gentle” if the K8s control plane is struggling. If load in the cluster is so high that Pod Informer caches are not consistent and, in consequence, `etcd` is rejecting writes considering them outdated; `flytepropeller` uses its local copy of the `ResourceVersion` to continue the evaluation loop, refraining from repeatedly requesting `etcd` operations that will be rejected, increasing load even more.
This release introduces an additional mechanism: if `flytepropeller` detects etcd-related errors in the K8s API Server, it triggers exponential backoff, further alleviating the load on both the K8s API server and `etcd`, thus increasing resiliency and fault tolerance in the face of transient errors.
Enabled Echo Plugin by Default
The echo plugin, which was previously opt-in, is now enabled by default. This plugin introduces an `echo` task, providing a simple way to test Flyte workflows and tasks. The echo task simply copies its input to its output and does not require a separate container to run. This is designed to improve workflow efficiency by allowing for conditional branches to be skipped without needing a placeholder noop task (see the example)
Users can now configure Flyte to send notifications via SMTP. Given that many enterprises rely on SMTP servers for email communication instead of external services like SendGrid, this feature provides support for a more traditional way to receive alerts and updates about Flyte executions. This is an example configuration for a Flyte sandbox instance using the SMTP interface provided by AWS SES:
And how to use it from a task definition:
Bug Fixes
This release addresses several bugs to provide a more stable and reliable user experience. Here are some notable fixes:
- The execution ID label, previously emitted by default in single binary, has been disabled to prevent potential `OOMKilled` events when the `flyte-binary` Pod is under high load. Also, it reduces cardinality for exported metrics, helping alleviate the load on monitoring systems like Prometheus.
- The error message for `MismatchingTypes` has been improved to provide clearer guidance on resolving type mismatch issues.
- The `flyte-core` Helm chart now supports caching configuration, improving the deployment and scaling process of Flyte. With this change, configuration values for `storage.cache.max_size_mbs` and `storage.cache.target_gc_percent` will be picked up correctly.
- Execution name readability has been enhanced to make it easier to identify and manage executions.
- The `imagePullPolicy` is now configured to `Always pull` in the Flyte sandbox environment, ensuring that the latest container images are used.
- To enhance consistency, the generation of default execution names has been moved to `flyteadmin`.
- The scheduler now uses deterministic execution names, improving the reliability and predictability of workflow executions.
- An issue where flytectl, the command-line interface for Flyte, returned the oldest workflow when using the `--latest` flag has been resolved.
- The use of explicit Go toolchain versions has been removed to streamline the development process and avoid compatibility problems.
- A new listing API has been added to `stow` storage, a storage backend for Flyte, providing better management and visibility of stored artifacts.
- A crash in `flytepropeller` that occurred when inferring the literal type for an offloaded literal has been fixed.
Documentation enhancements
- ImageSpec: additional examples and updated explanations.
- Agents setup: updates to multiple agent guides including Databricks, secrets management, and agents development.
To learn more, read the full 1.13.2 release notes.
Flytekit v1.13.6 release highlights
Flytekit v1.13.6 brings significant updates to enhance user experience and platform extensibility. Here are the key highlights:
1. Remote Workflow Registration Fix
Resolved issues around registering remote workflows, improving the robustness of the registration process. This includes adding a `register_workflow_script_mode` function, which leverages `register_scripts`, enabling frictionless remote executions with a relative source path.
2. Streamlined fast registration
Fast registration in Flytekit is a useful pattern to consider when you already have a container image and you change your workflow code without any changes in your system-level/Python dependencies.
Different Flytekit commands like `pyflyte run`, `pyflyte register`, and `pyflyte package`, had different default behaviors and used different flags for fast registration.
This release introduces a new, unified `--copy` flag (marked as beta) with three options: `all`, `auto`, and `none`, streamlining the process of registering and running scripts by simplifying the flag system and aligning the behavior across commands. The new `--copy` auto option automatically detects which Python modules are loaded and includes only those in the upload, reducing the size of the uploaded files. This change also modifies the way files are added to the archive, potentially impacting performance for folders with a large number of files. The current flags will still work until they are deprecated by the new mechanism.
Additionally, FlyteRemote now exposes `–copy-all` for programmatic remote fast-registration, and Spark tasks can now leverage fast registration too, avoiding situations when the Spark executors report a misleading `Module not found` error when trying to fast register a workflow.
3. Plugin and Agent Enhancements
Added support for MotherDuck integration with DuckDB such that a `DuckDBQuery` can query the Motherduck data warehouse and/or in-memory files simultaneously. Also, the flytekit-inference plugin now includes support for Ollama, a robust solution for local model serving. Ollama container can be easily deployed as a sidecar service, allowing users to invoke the model's endpoint as if it is hosted locally. Additionally, Flyte can manage data pre-processing and post-processing, facilitating the creation of end-to-end batch inference pipelines.
To learn more, read the full flytekit 1.13.6 release notes.
New contributors
We extend our heartfelt gratitude to all the contributors who have made invaluable contributions to Flyte 1.13.2 and flytekit 1.13.6 for the first time. Thank you for your dedication and support!
{{contributors-1-13="/blog-component-assets"}}
github.com/flyteorg/flytekit/pull/2778
We highly value the feedback of our users and community members, which helps us to improve our product continuously. To connect with other users and get support from our team, we encourage you to join our Slack channel, go to the #contribute channe, or pick up an open issue to start contributing!