High Level Overview to Parameter-Efficient Fine-Tuning (PEFT)

Jed Lee
5 min readAug 31, 2023

--

How to Fine-Tune your LLM without extensive computational costs

Photo by Denisse Leon on Unsplash

Content of this Article

  1. Introduction to Parameter-Efficient Fine-Tuning (PEFT)
  2. What was the Traditional Way of Fine-tuning before PEFT?
  3. What Pain Points Did PEFT Tackle?
  4. PEFT vs Few-Shot In-Context Learning (ICL)
  5. Conclusion

Introduction to Parameter-Efficient Fine-Tuning (PEFT)

PEFT, or Parameter-Efficient Fine-Tuning, is a fairly recent and innovative technique in the field of Natural Language Processing. Instead of fine-tuning an entire pre-trained model, PEFT identifies and adjusts only the most crucial model parameters relevant to a specific task. This technique essentially accelerates the fine-tuning of large models while consuming less memory.

PEFT gained prominence when a group of Microsoft researchers proposed LoRA, Low-Rank Adaptation of Large (Language) Models in June 2021. Since then, we have seen the momentum being further propelled with the subsequent emergence of techniques such as IA3 in May 2022, Quantized LoRA (QLoRA) and AuT-Few both in May 2023. These advancements have collectively set new benchmarks for efficiency within the NLP domain, which in some cases, enabled models to perform even better than humans.

What was the Traditional Way of Fine-tuning before PEFT?

Let’s understand what is Fine-Tuning~!

Image from andreaspung’s Github

Fine-tuning involves modifying a pre-trained model, such as BERT, which has been initially trained on extensive datasets, to cater to specific tasks using a more targeted dataset. This capitalizes on the model’s broad foundational knowledge, allowing it to specialize in niche domains. For instance, by fine-tuning a model with legal documents, such as contracts or court proceedings, it can discern legal intricacies, pinpoint potential contract issues, or even predict trial outcomes from past cases.

Extrapolate that to diverse domains and other uses cases like medical diagnostics, or financial forecasting, this is what makes fine-tuning so crucial.

Before the advent of PEFT, the norm was comprehensive model fine-tuning. This approach took the pre-trained model and retrained its entirety using a task-specific dataset. While this approach is effective in achieving task proficiency, it was resource-intensive, especially for Large Language models. In addition, it posed a risk of overfitting when the task-specific dataset was small and lacked the diversity of the original pre-training dataset.

PEFT provides an elegant solution by targeting and refining only the most pivotal parameters.

What Pain Points Did PEFT Tackle?

Credit: kras99 — stock.adobe.com
  1. Catastrophic Forgetting: Catastrophic forgetting is a major challenge in continual learning, as extensively fine-tuned models risk erasing prior, generalized knowledge. A February 2021 paper underscored this issue in neural networks. Essentially, as a model learns new information, the neural weights adjust, potentially overshadowing or “forgetting” what was previously learned. This phenomenon not only impedes the model’s adaptability but also limits its practical application across varied tasks. PEFT’s strategy of selectively freezing certain portions of the model serves as a countermeasure, ensuring that foundational knowledge is retained while new, task-specific information is incorporated.
  2. Computational Efficiency: As mentioned, fine-tuning large-scale models requires considerable computational resources. PEFT reduces this need by optimizing only a subset of parameters, making it feasible for smaller research groups or businesses without extensive infrastructure.
  3. Overfitting: Overfitting is a pitfall when a model, especially after extensive fine-tuning on limited data, becomes too tailored to that data and performs poorly on new, unseen data. PEFT’s selective parameter tuning mitigates this risk.
  4. Scalability: As models grow in size and complexity, traditional fine-tuning methods become even more computationally expensive. PEFT offers a scalable solution that can be applied to future models without linearly increasing computational costs.

PEFT vs Few-Shot In-Context Learning (ICL)

At the intersection of modern NLP advancements lie two approaches: PEFT (Parameter-efficient Fine-tuning) and Few-Shot In-Context Learning (ICL), both designed to leverage pre-trained models for specialized tasks using minimal adjustments.

A 2022 paper from UNC Chapel Hill titled “Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning” delineates these strategies’ strengths and limitations.

ICL operates on a simple yet powerful premise: it introduces a task to the model by feeding it a limited set of task-relevant examples as input. Unlike traditional methods that adjust the model’s weights based on these examples, ICL utilizes them directly as context to guide the model’s predictions. However, this approach has a drawback. For every new prediction or task, the model needs to reprocess these examples, making it relatively resource-heavier and potentially slower in real-time applications.

PEFT, on the other hand, embodies a more nuanced approach to task adaptation. Instead of presenting examples each time, it tweaks or adjusts a small subset of the model’s parameters to align with the target task. This results in a model that’s tailored for the task while retaining most of its original, pre-trained knowledge. The advantage here is twofold: it’s computationally more efficient as it avoids reprocessing examples, and it can, in certain scenarios, yield a more consistent performance and generalises better across different tasks due to its structured adjustments.

Conclusion

Fine-tuning is a testament to the flexibility and adaptability of deep learning models. It showcases how foundational knowledge in a model can be repurposed and refined to cater to evolving or niche tasks without building from scratch.

I look forward to see the potential that both PEFT, ICL, and other novel techniques can unlock in the future of NLP adaptation.

Thanks so much for reading my article!!! Feel free to drop me any comments, suggestions, and follow me on LinkedIn!

--

--

Jed Lee
Jed Lee

Written by Jed Lee

Passionate about AI & NLP. Based in Singapore. Currently a Data Scientist at PatSnap.

No responses yet