Always use same template format for training and inference
Use cases
Instruction Tuning
Enable Thinking mode
Structured output generation
Code completion
Step by Step reasoning
Advanced Use cases
Multimodal templates: handles images/audio/video
Document integration: include docs and knowledge bases
Custom template creation: domain specific
Template optimization: performance tuning
Enable Thinking mode
Standard mode output
<|im_start|>userWhat is 15 × 24?<|im_end|><|im_start|>assistant15 × 24 = 360<|im_end|>
Thinking mode output
<|im_start|>userWhat is 15 × 24?<|im_end|><|im_start|>assistant<|thinking|>I need to multiply 15 by 24. Let me break this down:15 × 24 = 15 × (20 + 4) = (15 × 20) + (15 × 4) = 300 + 60 = 360</|thinking|>15 × 24 = 360<|im_end|>
Generation prompt
Adds generation prompt at the end to make sure the model always give bot response instead of continuing user’s message
Used for
inference: Yes
training: No
evaluation: Yes
Example
from transformers import AutoTokenizertokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")messages = [ {"role": "user", "content": "Hi there!"}, {"role": "assistant", "content": "Nice to meet you!"}, {"role": "user", "content": "Can I ask a question?"}]# With generation promptformatted_chat = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True)print(formatted_chat)
Without Generation prompt
<|im_start|>userHi there!<|im_end|><|im_start|>assistantNice to meet you!<|im_end|><|im_start|>userCan I ask a question?<|im_end|>
With Generation prompt
<|im_start|>userHi there!<|im_end|><|im_start|>assistantNice to meet you!<|im_end|><|im_start|>userCan I ask a question?<|im_end|><|im_start|>assistant
Continue Final Message
Make the model continue the last message in a conversation instead of starting a new one
The last role must be assistant
Use cases
Structured Output Generation
Code completion
Step by Step Reasoning
# Structured Output: JSONmessages = [ {"role": "user", "content": "Can you format the answer in JSON?"}, {"role": "assistant", "content": '{"name": "'},]# Code completionmessages = [ {"role": "user", "content": "Write a Python function to calculate factorial"}, {"role": "assistant", "content": "def factorial(n):\n if n == 0:\n return 1\n else:\n return n * "}]# Step by Step Reasoningmessages = [ {"role": "user", "content": "Solve: 2x + 5 = 13"}, { "role": "assistant", "content": "Let me solve this step by step:\n\nStep 1: " }]formatted_chat = tokenizer.apply_chat_template( chat, tokenize=False, continue_final_message=True)