Low Rank Adaptation (LoRA)

It is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model’s layers
Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition
This significantly reduces the number of trainable parameters while maintaining model performance
QLoRA is for better memory efficiency
Ref:
- https://huggingface.co/learn/nlp-course/chapter11/4?fw=pt
- https://huggingface.co/learn/smol-course/unit1/3a
Library: https://github.com/huggingface/peft

LoRA Configuration

lets you adapt large models by training a small number of additional parameters while keeping the base model frozen
Most used method: LoRA
These methods can be used on any training stage be it SFT or RLHF