Fine-tuning Large Language Models with Declarative ML Orchestration

Using Flyte to unlock the potential of reproducible workflows for fine-tuning model training workloads

I entered the machine learning space more than a decade ago, right around the time AlexNet was created. Since then, I’ve witnessed innovation after innovation in deep learning model architectures, all powered by the backpropagation algorithm. In the long run, the only sustainable strategy I’ve found that helps me keep up with all the developments is to pick two to three techniques; read their corresponding papers; and (if possible) implement toy versions of these methods or read others’ code implementations. Then I’ll try to place them in the context of the broader trajectory of ML — bearing in mind that specific methods come and go, but the ability to learn and understand these methods is the primary skill to value in today’s pace of ML progress.

I took my latest deep dive to understand the various methods for fine-tuning Large Language Models (LLMs), and I’d like to share my learnings with you! I’ll be running a workshop to provide a hands-on primer for fine-tuning LLMs in practice. My workshop — scheduled for the Toronto Machine Learning Summit (TMLS) conference June 12 to 14 — will focus on reasons to fine-tune a pre-trained base LLM and how to do so using Flyte as an ML orchestrator.

What the workshop will cover

The Rationale for Fine-Tuning

Fine-tuning lets us specialize a base model for a specific data distribution by adjusting its weights. This practice is valuable in scenarios where data privacy is a concern and the use of an open-source base model is preferable. Even when data privacy isn't a top priority, we often turn to fine-tuning if prompt engineering on a powerful base model doesn't yield the desired results. I’ve also written about this more in this post comparing fine-tuning vs prompt engineering, and in this workshop I hope to provide you with more details on how to make this decision at a high level.

‍

Prompt engineering holds the model fixed while updating the inputs, while fine-tuning updates the model weights directly to get the desired output.

Methods of Fine-Tuning LLMs

The workshop will explore three strategies for fine-tuning LLMs:

Continued pre-training (CPT): Extending pre-training on an additional set of tokens specific to your problem set.
Supervised fine-tuning (SFT): Providing examples of the desired task in the form of prompt-response pairs to make your LLM feel a little more like a chatbot.
Reinforcement Learning on Human Feedback (RLHF): Using human preference data to train a model that can generate responses more closely aligned with human expectations.

The Role of Declarative ML Orchestration

ML orchestration isn’t just about DAGs (directed acyclic graphs); it’s also about ensuring that the right computation is done on valid data at the correct time using the appropriate underlying infrastructure. Declarative ML orchestration, in particular, helps you reason about:

The units of computation involved in your overall computation graph
How data is flowing between those units
What exactly those data comprise at any given time
The dependencies each unit applies to its computation
The resources available to each unit

And it helps you to reason about these things while abstracting away the implementation details of where, when and on what infrastructure the nodes on the graph execute. Orchestrating ML workflows when fine-tuning LLMs ensures each unit of compute is reproducible, repeatable and resource-efficient. An orchestrator like Flyte, which we will be using in the workshop, provides a flexible and powerful platform that unifies data, ML and analytics stacks.

Using Flyte to Fine-Tune LLMs

During the workshop, participants will use Flyte to fine-tune models via several strategies and base models. I’ll specifically focus on datasets for CPT and SFT using open-source base models like LLaMA and RedPajama. We’ll dive deep into the details of fine-tuning methods like Zero Redundancy Optimization, Parameter-Efficient Tuning, and 8-bit optimization, and see how we can leverage libraries like `deepspeed` `peft`, and `bitsandbytes` to integrate them conveniently into a Flyte workflow. We’ll also walk through a typical workflow, starting from data acquisition and culminating in deployment to HuggingFace spaces or batch inference.

In this workshop, you’ll learn how to build a fine-tuning workflow that publishes models to HuggingFace model hub for online inference and a workflow on Flyte for batch inference.

Along the way, we'll learn about Flyte capabilities that make writing this entire system in Python very easy, such as:

Built-in caching and cache versioning to avoid repeat computation
Declarative resource requests to get node-level GPU access
PyTorch Elastic plugin for distributed training
Pandera plugin for data validation
Flyte Decks for task observability
Checkpointing with the transformers FlyteCallback
Declarative dependencies with `ImageSpec`

Join the workshop!

Whether you're new to machine learning or an experienced practitioner, this workshop will give you the conceptual understanding to decide whether fine-tuning will work for you — and if it is, how to do it using modern ML libraries and orchestrators.