Observability
observability
Meanings
Observability pillars include logs, metrics, and traces. Modern observability also includes metadata, user behavior, topology and network mapping, and code-level details.
Why important ?
- 관찰 가능성은 분산 환경에서 발생하는 다양한 상황에 대해서 이해할 수 있다.
- 관찰 가능성을 통해 Slow Query, Incidents, Optimizing 를 위해 무엇을 해야하는지 이해할 수 있다.
Telemetry
- OpenTelemetry; https://opentelemetry.lightstep.com/
- Metrics, logging and tracing: https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
- Which trace to collect:
Real world
Netflix:
- Application monitoring: https://netflixtechblog.com/telltale-netflix-application-monitoring-simplified-5c08bfa780ba
- Distributed tracing: https://netflixtechblog.com/building-netflixs-distributed-tracing-infrastructure-bb856c319304
- Edgar solving mysterious: https://netflixtechblog.com/edgar-solving-mysteries-faster-with-observability-e1a76302c71f
- Self-serve dashboard: https://netflixtechblog.com/lumen-custom-self-service-dashboarding-for-netflix-8c56b541548c
- Build observability tools: https://netflixtechblog.com/lessons-from-building-observability-tools-at-netflix-7cfafed6ab17
- Netflix On instance trace: https://netflixtechblog.com/introducing-bolt-on-instance-diagnostic-and-remediation-platform-176651b55505
- Netflix system intuition: https://netflixtechblog.com/flux-a-new-approach-to-system-intuition-cf428b7316ec
- Time series data at Netflix: https://netflixtechblog.com/scaling-time-series-data-storage-part-i-ec2b6d44ba39
Case study: Netflix's ElasticSearch -> Cassandra (SSD->EBS):
- Building Netflix’s Distributed Tracing Infrastructure
- Lessons from Building Observability Tools at Netflix
Coinbase:
AppDynamics vs Dynatrace
- AppDynamics, Dynatrace, OpenTelemetry
- Is Standard Java Logging Dead? Log4j vs. Log4j2 vs. Logback vs. java.util.logging
Application Performance Monitoring
- What is APM? Application performance monitoring in a cloud-native world
- Datadog - APM Terms and Concepts
SLO
ELK
Uber M3
Datadog
- Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
- Datadog + OpenTracing: Embracing the open standard for APM