Experiments

❯

Machine Learning

❯

❯

LLM_Architecture

LLM_Architecture

Jun 28, 20261 min read

LLM Architecture

https://huggingface.co/learn/llm-course/chapter1/4
https://huggingface.co/learn/llm-course/chapter1/6

Inference

It is the process of using a trained LLM to generate human-like text from a given input prompt
Ref: https://huggingface.co/learn/llm-course/chapter1/8

Prefill Phase

This phase is computationally-intensive because it needs to process all input tokens at once.
Steps:
- Tokenization: Converting the input text into tokens (think of these as the basic building blocks the model understands)
- Embedding Conversion: Transforming these tokens into numerical representations that capture their meaning
- Initial Processing: Running these embeddings through the model’s neural networks to create a rich understanding of the context

Decode Phase

This phase is where the actual text generation happens
The model generates one token at a time in what we call an autoregressive process
This phase is memory-intensive because the model needs to keep track of all previously generated tokens and their relationships.
Steps:
- Attention Computation: Looking back at all previous tokens to understand context
- Probability Calculation: Determining the likelihood of each possible next token
- Token Selection: Choosing the next token based on these probabilities
- Continuation Check: Deciding whether to continue or stop generation

LLM Architecture
Inference
Prefill Phase
Decode Phase

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community