Join us

ContentUpdates and recent posts about Pelagia..
Link
@kala shared a link, 5 days, 10 hours ago
FAUN.dev()

A trillion dollars is a terrible thing to waste

OpenAI co-founder Ilya Sutskever just said the quiet part out loud: scaling laws are breaking down. Bigger models aren’t getting better at thinking, they’re getting worse at generalizing and reasoning. Now he’s eyeingneurosymbolic AIandinnate inductive constraints. Yep, the “just make it huge” era m.. read more  

A trillion dollars is a terrible thing to waste
Link
@kala shared a link, 5 days, 10 hours ago
FAUN.dev()

Prompts for Open Problems

The author, Ben Recht, proposes five research directions inspired by his graduate machine learning class, arguing for different research rather than just more. These prompts include adopting a design-based view for decision theory, explaining the robust scaling trends in competitive testing, and mov.. read more  

Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

Advancing Our Chef Infrastructure: Safety Without Disruption

Slack pulled back the curtain onSlack AI, its LLM-powered assistant built with a fortress mindset. Every customer gets their ownisolated environment. Any data passed tovendor LLMs? It'sephemeral. Gone before it can stick. No fine-tuning. No exporting data outside Slack. And there’s a wholemiddle-lay.. read more  

Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

Why we're leaving serverless

Every millisecond matters in the critical path of API authentication. After two years of battling serverless limitations, the entire API stack was rebuilt to reduce end-to-end latency. The move from Cloudflare Workers to stateful Go servers resulted in a 6x performance improvement and simplified arc.. read more  

Why we're leaving serverless
Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

Failure is inevitable: Learning from a large outage, and building for reliability in depth at

Datadog ditched its “never fail” mindset after a March 2023 meltdown knocked out half its Kubernetes nodes and took major user features down with them. The fix? A full-stack rethink built aroundgraceful degradation. The team addeddisk-based persistence at intake,live-data prioritization,QoS-aware re.. read more  

Failure is inevitable: Learning from a large outage, and building for reliability in depth at
Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

You’ll never see attrition referenced in an RCA

Lorin Hochstein argues that while high-profile engineer attrition is often speculated to contribute to major outages, it is universally absent from public Root Cause Analyses (RCAs). This exclusion occurs because public RCAs aim to reassure customers by focusing on technical fixes, whereas attrition.. read more  

Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

Declarative Action Architecture

The Declarative Action Architecture (DAA) is a scalable E2E testing pattern that separates concerns across three distinct layers. TheTest Layeris 100% declarative, statingwhatis being tested without any procedural logic, making tests read like documentation. The coreAction Layerimplements the execut.. read more  

Declarative Action Architecture
Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025

A new open-source benchmark looked at 183,000 AWS Lambda invocations, andarm64 beats x86_64across the board in both cost and speed. Rust on arm64 with SHA-256 tuned in assembly? It clocks in 4–5× faster than x86 in CPU-heavy tasks. Cold starts are snappy too—5–8× quicker than Node.js and Python... read more  

Comparing AWS Lambda Arm64 vs x86_64 Performance Across Multiple Runtimes in Late 2025
Link
@devopslinks shared a link, 5 days, 10 hours ago
FAUN.dev()

The story of how we almost got hacked

Team Invictus caught a BEC attempt using WeTransfer to slip in a fake Microsoft 365 login page powered byEvilProxy. Classic Adversary-in-the-Middle move, but dressed up with a slick delivery package. Digging deeper, the team mapped the attacker’s setup and found something bigger: a credential grab c.. read more  

The story of how we almost got hacked
News FAUN.dev() Team Trending
@kaptain shared an update, 5 days, 11 hours ago
FAUN.dev()

Agent Sandbox Brings Kernel-Level Guardrails to AI Agents on Kubernetes

Kubernetes gVisor Kata Containers Google Kubernetes Engine (GKE)

Agent Sandbox, a new Kubernetes primitive, was introduced at KubeCon NA 2025 to enhance AI agent management on Kubernetes and Google Kubernetes Engine.

Agent Sandbox Brings Kernel-Level Guardrails to AI Agents on Kubernetes
Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.