Join us

ContentUpdates and recent posts about Grafana Tempo..
Link
@kala shared a link, 1 month, 2 weeks ago
FAUN.dev()

1,500+ PRs Later: Spotify’s Journey with Our Background Coding Agent

Spotify just gave its internal Fleet Management tooling a serious brain upgrade. They've wired inAI coding agentsthat now handle source-to-source transformations across repos - automatically. So far? Over 1,500 AI-generated PRs pushed. Not just lint fixes - these include heavy-duty migrations. They'.. read more  

1,500+ PRs Later: Spotify’s Journey with Our Background Coding Agent
Link
@kala shared a link, 1 month, 2 weeks ago
FAUN.dev()

AI and QE: Patterns and Anti-Patterns

The author shared insights on how AI can be leveraged as a QE and highlighted potential dangers to watch out for, drawing parallels with misuse of positive behaviors or characteristics taken out of context. The post outlined anti-patterns related to automating tasks, stimulating thinking, and tailor.. read more  

Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

How when AWS was down, we were not

During the AWS us-east-1 meltdown - when DynamoDB, IAM, and other key services went dark - Authress kept the lights on. Their trick? A ruthless edge-first, multi-region setup built for failure. They didn’t hope DNS would save them. They wired in automated failover, rolled their own health checks, an.. read more  

How when AWS was down, we were not
Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

Collaborating with Terraform: How Teams Can Work Together Without Breaking Things

When working with Terraform in a team environment, common issues may arise such as state locking, version mismatches, untracked local applies, and lack of transparency. Atlantis is an open-source tool that can help streamline collaboration by automatically running Terraform commands based on GitHub .. read more  

Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

Self Hostable Multi-Location Uptime Monitoring

Vigilant runs distributed uptime checks with self-registeringGo-based "outposts"scattered across the globe. Each one handles HTTP and Ping, reports back latency by region, and calls home over HTTPS. The magic handshake? Vigilant plays root CA, handing outephemeral TLS certson the fly... read more  

Self Hostable Multi-Location Uptime Monitoring
Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

Test Automation Structure for Single Code Base Projects

The authors discuss the development of a new automation infrastructure post-merger, leading to a unified automation project that can handle all cultures, languages, and clients efficiently. They chose Playwright over Cypress for its improved resource usage and faster execution times, aligning better.. read more  

Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

How Netflix optimized its petabyte-scale logging system with

Netflix overhauled its logging pipeline to chew through5 PB/day. The stack now leans onClickHousefor speed andApache Icebergto keep storage costs sane. Out went regex fingerprinting - slow and clumsy. In came aJFlex-generated lexerthat actually keeps up. They also ditched generic serialization in fa.. read more  

How Netflix optimized its petabyte-scale logging system with
Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

The AI Gold Rush Is Forcing Us to Relearn a Decade of DevOps Lessons

Sauce Labs just dropped a reality check:95% of orgshave fumbled AI projects. The kicker?82% don’t have the QA talent or toolsto keep things from breaking. Even worse,61% of leaders don’t get software testing 101, leaving AI pipelines full of holes - cultural, procedural, and otherwise. System shift:.. read more  

Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

A Love Letter to FreeBSD

A Linux user takes FreeBSD for a spin - and comes away impressed. What stands out? Clean, deliberate engineering.Boot environmentsmake updates stress-free. The newpkgbasesystem adds modularity without chaos. And the OS treatsuptimenot just as a metric, but as a design goal. The essay makes a solid c.. read more  

Link
@devopslinks shared a link, 1 month, 2 weeks ago
FAUN.dev()

The $1,000 AWS mistake

A missingVPC Gateway Endpointsent EC2-to-S3 traffic through aNAT Gateway, lighting up over$1,000in unnecessary data processing charges. All that for in-region traffic hitting an AWS service. Why? AWS defaulted the route to the NAT Gateway. It only takes the free S3 Gateway Endpoint if youtellit to. .. read more  

The $1,000 AWS mistake
Grafana Tempo is a distributed tracing backend built for massive scale and low operational overhead. Unlike traditional tracing systems that depend on complex databases, Tempo uses object storage—such as S3, GCS, or Azure Blob Storage—to store trace data, making it highly cost-effective and resilient. Tempo is part of the Grafana observability stack and integrates natively with Grafana, Prometheus, and Loki, enabling unified visualization and correlation across metrics, logs, and traces.

Technically, Tempo supports ingestion from major tracing protocols including Jaeger, Zipkin, OpenCensus, and OpenTelemetry, ensuring easy interoperability. It features TraceQL, a domain-specific query language for traces inspired by PromQL and LogQL, allowing developers to perform targeted searches and complex trace-based analytics. The newer TraceQL Metrics capability even lets users derive metrics directly from trace data, bridging the gap between tracing and performance analysis.

Tempo’s Traces Drilldown UI further enhances usability by providing intuitive, queryless analysis of latency, errors, and performance bottlenecks. Combined with the tempo-cli and tempo-vulture tools, it delivers a full suite for trace collection, verification, and debugging.

Built in Go and following OpenTelemetry standards, Grafana Tempo is ideal for organizations seeking scalable, vendor-neutral distributed tracing to power observability at cloud scale.