Supervised Fine Tuning (SFT)

It is a process primarily used to adapt pre-trained language models to follow instructions, engage in dialogue, and use specific output formats.
It helps transform into assistant-like models that can better understand and respond to user prompts
Template Control: It allows precise control over the model’s output structure
- Generate responses in a specific chat template format
- Follow strict output schemas
- Maintain consistent styling across responses
Domain Adaptation: It helps align the model with domain-specific requirements by:
- Teaching domain terminology and concepts
- Enforcing professional standards
- Handling technical queries appropriately
- Following industry-specific guidelines
Use cases
- Instruction Following
- Structured Data Extraction
- Tool calling
- Domain Specialization
- Coding Assistants
Ref: https://huggingface.co/learn/llm-course/chapter11/3

Steps for SFT

Processing Data and Setup Model

Dataset (for ex. HuggingFaceTB/smoltalk)
- Each training example should have
  - Input prompt: user’s instruction/question
  - Expected Response: ideal assistant response
  - Context (optional): additional info
Setup Model
Configure Tokenizer
Setup ChatTemplate

Training

Training Dataset
Validation Dataset
Setup Training Hyper parameters
- max steps or number of epochs: Controls total training duration
- batch size of Datasets: Determines memory usage and training stability
- gradient accumulation steps: Enables larger effective batch sizes
- learning rate: Controls size of weight updates
- warmup ratio: Portion of training used for learning rate warmup
- save steps: Frequency of model checkpoint saves
- eval steps: How often to evaluate on validation data

Metrics

https://huggingface.co/learn/smol-course/unit1/3
Monitor metrics
- Training Loss
- Validation Loss
- Learning rate progression
- Gradient norms
- GPU memory usage
- Training throughput

Evaluation

See Evaluation

Instruction Tuning

It is process of adapting pre-trained language models to follow human instructions and engage in conversations

Base Model vs Instruct Model

https://huggingface.co/learn/smol-course/unit1/2
Base models are trained to predict next token
- general language understanding
instruction tuned models are trained to:
- Follow user instructions
- Engage in natural conversations
- Provide helpful harmless and honest responses
- Maintain context in multi-turn conversations
- Use tools or MCP servers to perform tasks
Base model:
- Input: The weather today is
- Output: sunny and warm
- Example: https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base
Instruct model:
- Input: What’s weather like?
- Output: The weather is sunny and warm
- Example: https://huggingface.co/HuggingFaceTB/SmolLM3-3B
Instruct models have chat templates defined
- https://huggingface.co/HuggingFaceTB/SmolLM3-3B/blob/main/chat_template.jinja
- ChatML is used

Chat Templates

ChatML is used
Always use same template format for training and inference
Use cases
- Instruction Tuning
- Enable Thinking mode
- Structured output generation
- Code completion
- Step by Step reasoning
Advanced Use cases
- Multimodal templates: handles images/audio/video
- Document integration: include docs and knowledge bases
- Custom template creation: domain specific
- Template optimization: performance tuning

Enable Thinking mode

Standard mode output

<|im_start|>user
What is 15 × 24?<|im_end|>
<|im_start|>assistant
15 × 24 = 360<|im_end|>

Thinking mode output

<|im_start|>user
What is 15 × 24?<|im_end|>
<|im_start|>assistant
<|thinking|>
I need to multiply 15 by 24. Let me break this down:
15 × 24 = 15 × (20 + 4) = (15 × 20) + (15 × 4) = 300 + 60 = 360
</|thinking|>
 
15 × 24 = 360<|im_end|>

Generation prompt

Adds generation prompt at the end to make sure the model always give bot response instead of continuing user’s message
Used for
- inference: Yes
- training: No
- evaluation: Yes
Example

from transformers import AutoTokenizer
 
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
 
messages = [
    {"role": "user", "content": "Hi there!"},
    {"role": "assistant", "content": "Nice to meet you!"},
    {"role": "user", "content": "Can I ask a question?"}
]
 
# With generation prompt
formatted_chat = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)
 
print(formatted_chat)

Without Generation prompt

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

With Generation prompt

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

Continue Final Message

Make the model continue the last message in a conversation instead of starting a new one
The last role must be assistant
Use cases
- Structured Output Generation
- Code completion
- Step by Step Reasoning

# Structured Output: JSON
messages = [
    {"role": "user", "content": "Can you format the answer in JSON?"},
    {"role": "assistant", "content": '{"name": "'},
]
 
# Code completion
messages = [
    {"role": "user", "content": "Write a Python function to calculate factorial"},
    {"role": "assistant", "content": "def factorial(n):\n    if n == 0:\n        return 1\n    else:\n        return n * "}
]
 
# Step by Step Reasoning
messages = [
    {"role": "user", "content": "Solve: 2x + 5 = 13"},
    {
        "role": "assistant",
        "content": "Let me solve this step by step:\n\nStep 1: "
    }
]
 
formatted_chat = tokenizer.apply_chat_template(
    chat, 
    tokenize=False, 
    continue_final_message=True
)

Experiments

Explorer

SFT

Supervised Fine Tuning (SFT)

Steps for SFT

Processing Data and Setup Model

Training

Metrics

Evaluation

Instruction Tuning

Base Model vs Instruct Model

Chat Templates

Enable Thinking mode

Generation prompt

Continue Final Message

Table of Contents