Observability

Ability to measure a system’s current state based on the data it generates, such as logs, metrics and traces.
Advantages:
- Better Visibility
- Better Alerting
- Better Efficiency
3 Pillars:
- Logs
- Metrics
- Tracing
These 3 pillars are referred to as Telemetry Data
Facade libraries in Spring boot:
- SLF4J (logs)
- Micrometer (metrics)
- Spring Cloud Sleuth (tracing) — Successor: Micrometer Tracing

Logging

Detailed information about individual things that are ongoing in your application
It is better to use centralized logging since simple logging:
- Doesn’t scale well
- Low Usability: difficult to reconstruct the chain of events in concurrent logs
Stream log information to centralized logging
Use standard formats like:
- Common Log Format (NCSA)
- JSON Log Format
- For more formats: https://graylog.org/post/log-formats-a-complete-guide/
Using standard formats help centralized logging system to index and make search faster
Example tools:
- ELK stack: Logstash → Elasticsearch → Kibana
- Cloud Providers:
  - Splunk
  - Graylog
  - Solarwinds Loggly
- Google Cloud Logging
- Amazon CloudWatch
- Azure Log Analytics
Spring Boot Logging

Aggregated information like counts, averages etc. about application features
Some types of metrics:
- Counters
- Gauges
- Timers
- Summary
Examples:
- CPU Usage: 18%
- Memory Usage: 195 MB
- Disk Read/Write: 51.2 MB
- Network I/O: 3.7 GB/1.8 GB
Advantages:
- Alerts: Based on some criteria on metrics an alert can be created
- Trends: How metrics change over time
- Impact of failure: In case of failures it can provide visibility of the impact
- Performance tuning
- Verifies the system architecture
There are two ways metrics are collected:
- Push (eg. NewRelic, AppDynamics)
- Pull (eg. prometheus)
Metrics are stored inMemory and better to publish it to monitoring system usually saving it into time series database
Time Series database examples:
- Prometheus
- Wavefront
- Dynatrace
Metrics are published to monitoring systems:
- Elastic APM
- Prometheus
- Dynatrace
- Wavefront
Spring Boot Metrics

Sampled information across multiple services
Sampling traces some but not all requests since it can overload the system
Sampling rate is by default 10 per second
Advantages:
- Create service map to show communication between services
- Path breakdown
- Timing information for each service
- Improve Mean Time to Detect (MTTD) and Mean Time to Repaid (MTTR)
Tracing backend examples:
- Wavefront
- Zipkin (OpenZipkin originally developed by Google)
- Jaeger (part of CNCF, originally developed by Uber)
Spring Boot Tracing

TraceID is used to correlate logging and tracing
methodName (URL) is used to correlate tracing and metrics
With right data, effective correlations can be obtained to find the root cause
For example:
- Traffic spike is highly correlated with the user john@email.com

collecting and analyzing predefined data types (network bandwidth, CPU utilization rates, etc.) in order to detect abnormal behaviors that might indicate problems.
part of Observability
with monitoring, you might be asking “is an individual piece (network, website, application or other service) up and running as expected?”
with observability, you’re asking a bigger question: “How well is everything working?”

Profiling refers to the practice of collecting and analyzing data about the performance and behavior of software applications or systems

aka RUM
used in frontend
collects information on the users of your apps and the actions they perform on the frontend applications

refers to adding capabilities to systems and applications to track and capture information that can be used to observe the behavior and performance

open source observability framework that provides standardized protocols and tools for collecting and routing telemetry data.