Supervised Fine Tuning (SFT)

  • It is a process primarily used to adapt pre-trained language models to follow instructions, engage in dialogue, and use specific output formats.
  • It helps transform into assistant-like models that can better understand and respond to user prompts
  • Template Control: It allows precise control over the model’s output structure
    • Generate responses in a specific chat template format
    • Follow strict output schemas
    • Maintain consistent styling across responses
  • Domain Adaptation: It helps align the model with domain-specific requirements by:
    • Teaching domain terminology and concepts
    • Enforcing professional standards
    • Handling technical queries appropriately
    • Following industry-specific guidelines
  • Use cases
    • Instruction Following
    • Structured Data Extraction
    • Tool calling
    • Domain Specialization
    • Coding Assistants
  • Ref: https://huggingface.co/learn/llm-course/chapter11/3

Steps for SFT

Processing Data and Setup Model

  • Dataset (for ex. HuggingFaceTB/smoltalk)
    • Each training example should have
      • Input prompt: user’s instruction/question
      • Expected Response: ideal assistant response
      • Context (optional): additional info
  • Setup Model
  • Configure Tokenizer
  • Setup ChatTemplate

Training

  • Training Dataset
  • Validation Dataset
  • Setup Training Hyper parameters
    • max steps or number of epochs: Controls total training duration
    • batch size of Datasets: Determines memory usage and training stability
    • gradient accumulation steps: Enables larger effective batch sizes
    • learning rate: Controls size of weight updates
    • warmup ratio: Portion of training used for learning rate warmup
    • save steps: Frequency of model checkpoint saves
    • eval steps: How often to evaluate on validation data

Metrics

Evaluation

Instruction Tuning

  • It is process of adapting pre-trained language models to follow human instructions and engage in conversations

Base Model vs Instruct Model

Chat Templates

  • ChatML is used
  • Always use same template format for training and inference
  • Use cases
    • Instruction Tuning
    • Enable Thinking mode
    • Structured output generation
    • Code completion
    • Step by Step reasoning
  • Advanced Use cases
    • Multimodal templates: handles images/audio/video
    • Document integration: include docs and knowledge bases
    • Custom template creation: domain specific
    • Template optimization: performance tuning

Enable Thinking mode

  • Standard mode output
<|im_start|>user
What is 15 × 24?<|im_end|>
<|im_start|>assistant
15 × 24 = 360<|im_end|>
  • Thinking mode output
<|im_start|>user
What is 15 × 24?<|im_end|>
<|im_start|>assistant
<|thinking|>
I need to multiply 15 by 24. Let me break this down:
15 × 24 = 15 × (20 + 4) = (15 × 20) + (15 × 4) = 300 + 60 = 360
</|thinking|>
 
15 × 24 = 360<|im_end|>

Generation prompt

  • Adds generation prompt at the end to make sure the model always give bot response instead of continuing user’s message
  • Used for
    • inference: Yes
    • training: No
    • evaluation: Yes
  • Example
from transformers import AutoTokenizer
 
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
 
messages = [
    {"role": "user", "content": "Hi there!"},
    {"role": "assistant", "content": "Nice to meet you!"},
    {"role": "user", "content": "Can I ask a question?"}
]
 
# With generation prompt
formatted_chat = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)
 
print(formatted_chat)
  • Without Generation prompt
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
  • With Generation prompt
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

Continue Final Message

  • Make the model continue the last message in a conversation instead of starting a new one
  • The last role must be assistant
  • Use cases
    • Structured Output Generation
    • Code completion
    • Step by Step Reasoning
# Structured Output: JSON
messages = [
    {"role": "user", "content": "Can you format the answer in JSON?"},
    {"role": "assistant", "content": '{"name": "'},
]
 
# Code completion
messages = [
    {"role": "user", "content": "Write a Python function to calculate factorial"},
    {"role": "assistant", "content": "def factorial(n):\n    if n == 0:\n        return 1\n    else:\n        return n * "}
]
 
# Step by Step Reasoning
messages = [
    {"role": "user", "content": "Solve: 2x + 5 = 13"},
    {
        "role": "assistant",
        "content": "Let me solve this step by step:\n\nStep 1: "
    }
]
 
formatted_chat = tokenizer.apply_chat_template(
    chat, 
    tokenize=False, 
    continue_final_message=True
)