Join us

FAUN.dev() is where engineers from GitHub, Netflix, and Shopify go to stay ahead — fast.

An effortless, straightforward way to keep up with technologies...so you can keep your tabs closed and your mind open!

70,000+ developers already joined our ecosystem ⭐⭐⭐⭐⭐
Trusted by engineers at:

Google • Microsoft • AWS • Netflix

Pelagia

Pelagia is a Kubernetes controller that implements lifecycle management for Ceph clusters managed by Rook.

Featured Course(s)

Cloud-Native Microservices With Kubernetes - 2nd Edition

A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in Kubernetes

> Get Your Copy

Content

Updates and recent posts about Pelagia..

Posts
Description

Link

@kaptain shared a link, 3 days, 21 hours ago

FAUN.dev()

v1.36: Tiered Memory Protection with Memory QoS

Kubernetes v1.36 rolls out Memory QoS (alpha). Opt-inmemory reservation. Tiered protection by QoS class. Kubelet observability metrics. Kernel-version warnings. It separatesthrottlingfromreservation. A feature gate enables throttling. A kubelet config field controls tieredcgroup v2protection:Guarant.. read more

Link

@kaptain shared a link, 3 days, 21 hours ago

FAUN.dev()

v1.36: In-Place Vertical Scaling for Pod-Level Resources Graduates to Beta

Kubernetes v1.36 moves In-Place Pod-Level Resources Vertical Scaling to Beta and flips the feature gate on by default. Operators can patch a Pod's aggregate resource to resize running Pods. Often no container restart is needed. Kubelet breaks the Pod-level change into per-container resize events. It.. read more

Link

@kaptain shared a link, 3 days, 21 hours ago

FAUN.dev()

Auto-Diagnosing Kubernetes Alerts with HolmesGPT and CNCF Tools

STCLab built an AI investigation pipeline withHolmesGPT, a 200-linePythonplaybook, andOpenTelemetry. It streamedMimir,Loki, andTempointo Slack threads. Metadata-driven markdownrunbookslimited tools per namespace, cut wasted tool calls from 16 to 2, and let the same model resolve alerts faster... read more

Auto-Diagnosing Kubernetes Alerts with HolmesGPT and CNCF Tools

Link

@kaptain shared a link, 3 days, 21 hours ago

FAUN.dev()

v1.36: Staleness Mitigation and Observability for Controllers

Kubernetes v1.36 shipsclient-goatomicFIFOprocessing and cache-introspection APIs. Controllers detect stale informer state and skip acting on it. kube-controller-managerenables the capability by default for four high-contention pod controllers. It addsalpha metricsfor skipped syncs and informer resou.. read more

Link

@kala shared a link, 3 days, 22 hours ago

FAUN.dev()

An open-weights Chinese model just beat Claude, GPT-5.5, and Gemini in a programming challenge

The AI Coding Contest Day 12 matched ten models on a sliding‑letter puzzle. Open‑weightsKimi K2.6took first: 22 match points (7‑1‑0).MiMo V2‑Proscored second by blasting claims for intact ≥7‑letter seeds (43 points).GPT‑5.5andClaude Opus 4.7landed third and fifth. Grids ran10×10→30×30. Heavy scrambl.. read more

An open-weights Chinese model just beat Claude, GPT-5.5, and Gemini in a programming challenge

Link

@kala shared a link, 3 days, 22 hours ago

FAUN.dev()

Monitoring LLM behavior: Drift, retries, and refusal patterns

Traditional software is predictable due to determinism, while generative AI is unpredictable. Engineers need a new infrastructure layer, the AI Evaluation Stack, to ship enterprise-ready AI products. The stack includes deterministic assertions and model-based assertions to ensure structural integrit.. read more

Link

@kala shared a link, 3 days, 22 hours ago

FAUN.dev()

Introducing the Agent Readiness score. Check to see if your site is agent-ready

Cloudflare launchedIsItAgentReady. It scans200kdomains, scoresagent readiness, publishes weekly adoption charts, and exposes results via anAPI. It checksrobots.txt,llms.txt, content negotiation viaAccept: text/markdown,API Catalog,.well-known/mcp.json, OAuth discovery, andx402payments. Cloudflare ov.. read more

Introducing the Agent Readiness score. Check to see if your site is agent-ready

Link

@kala shared a link, 3 days, 22 hours ago

FAUN.dev()

The AI engineering stack we built internally - on the platform we ship

Cloudflare wired AI into the engineering stack. LLM traffic funnels through aproxy WorkerandAI Gateway. It shippedWorkers AIand theAgents SDK. Daily users hit 3,683 (93% R&D). MR throughput climbed to ~10,952/week.Workers AIhandled 51B input tokens and cut a security agent's inference spend by 77%... read more

The AI engineering stack we built internally - on the platform we ship

Link

@kala shared a link, 3 days, 22 hours ago

FAUN.dev()

Multi-Agent System Reliability

LLMs are unreliable out of the box, but multi-agent systems can improve by dividing work among specialized agents. Building robust systems involves leveraging human system patterns like hierarchy, consensus, adversarial debate, and knock-out in a multi-agent architecture to ensure correctness and re.. read more

Link

@devopslinks shared a link, 4 days ago

FAUN.dev()

How incidents can teach us about what’s already working well

A famous optical illusion developed by Edward H. Adelson shows that two squares, despite appearing different in shade, are actually the same gray. This illusion demonstrates how the brain processes light, shadow, and objects when interpreting visual signals from the optic nerve. Studying such illusi.. read more

How incidents can teach us about what’s already working well

Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.