Observability

https://huggingface.co/learn/agents-course/bonus-unit2/what-is-agent-observability-and-evaluation
Use cases
- Understand cost accuracy tradeoff
- Measure Latency
- Detect Harmful language and prompt injection
- Monitor user feedback
Tools
- Langfuse
- Arize
- LlamaIndex
- LangGraph

Traces and Spans

Traces: represent a complete agent task from start to finish
- user query
Spans: individual steps within the trace
- calling language model
- retrieving data

Latency: How quickly does the agent respond?
Costs: What’s the expense per agent run?
Request Errors: How many requests did the agent fail?
User Feedback: Implementing direct user evaluations provide valuable insights
- explicit ratings (thumbs-up/down, 1-5 stars) or textual comments by user
Implicit User Feedback: User behaviors provide indirect feedback even without explicit ratings
- immediate question rephrasing
- repeated queries
- clicking a retry button
Accuracy: How frequently does the agent produce correct or desirable outputs?
Automated Evaluation Metrics: Like LLM-as-a-Judge

Time to First Token (TTFT)
- How quickly can you get the first response?
- This is crucial for user experience and is primarily affected by the prefill phase
Time Per Output Token (TPOT)
- How fast can you generate subsequent tokens?
- This determines the overall generation speed
Throughput
- How many requests can you handle simultaneously?
- This affects scaling and cost efficiency
VRAM Usage
- How much GPU memory do you need?
- This often becomes the primary constraint in real-world applications

https://mikeveerman.github.io/tokenspeed/?rate=60&mode=text
Measured in TPS (Token per second)
Humans reading speed
- 1 Token in english ~ 0.75 word
- 200 - 300 words per minute = 3.3 - 5 words per second
- = 4 - 7 TPS
Human Typing speed
- 40 - 60 words per minute = 0.67 - 1 words per second
- = 0.89 - 1.3 TPS
For Chat bots
- 10 - 20 TPS is good
For Agents or Coding tasks
- 50+ TPS is good since you don’t need to read everything
MacBook M1 Pro, 16 GB
- Qwen 2.5 (7b): 27 TPS
- Gemma 3 (4b): 44 TPS
- Gemma 4 (12b): 13 TPS