Join us

ContentUpdates and recent posts about Pelagia..
Link
@faun shared a link, 2 weeks, 6 days ago

Becoming a Research Engineer at a Big LLM Lab - 18 Months of Strategic Career Development

To land a big career role like Mistral, mix efficient **tactical** moves (like LeetCode practice) with **strategic** ups, like building a powerful portfolio and a solid network. Balance is key; aim to impress and prepare well without overlooking the power of strategy in shaping a successful career...

Link
@faun shared a link, 2 weeks, 6 days ago

Shai-Hulud npm Supply Chain Attack

Malicious npm packages just leveled up: this one dropped a self-spreading worm that hijacks repos and leaks secrets the moment it lands. It abuses `postinstall` scripts to run TruffleHog and swipe tokens straight from your codebase. Then it uses GitHub Actions to exfiltrate the loot and auto-publis..

Shai-Hulud npm Supply Chain Attack
Link
@faun shared a link, 2 weeks, 6 days ago

Demystifying Log Retention in Azure

Azure logs come in three flavors: **Activity Logs**, **Diagnostic Logs**, and **Log Analytics**. Each with its own rules for retention and billing. The catch? Those differences aren’t quirks—they’re baked in...

Link
@faun shared a link, 2 weeks, 6 days ago

Observability for the Invisible: Tracing Message Drops in Kafka Pipelines

When an event drops silently in a distributed system, it is not a bug, it is an architectural blind spot. Detect, debug, and prevent message loss in Kafka-based streaming pipelines using tools like OpenTelemetry, Fluent Bit, Jaeger, and dead-letter queues. Make sure observability gaps in event strea..

Link
@faun shared a link, 2 weeks, 6 days ago

How FinOps Drives Value for Every Engineering Dollar

Duolingo’s FinOps crew didn’t just track cloud costs—they wired up sharp, automated observability across 100+ microservices. Real-time alerts now catch AI and infra spend spikes before they torch the budget. They sliced TTS costs by 40% with in-memory caching. Dumped pricey CloudWatch metrics for P..

How FinOps Drives Value for Every Engineering Dollar
Link
@faun shared a link, 2 weeks, 6 days ago

Top 30 Argo CD Anti-Patterns to Avoid When Adopting Gitops

A teardown of Argo CD anti-patterns calls out 28 common misfires—stuff like skipping Git for Application CRDs or stuffing Helm/Kustomize config right into Argo CD manifests. Yikes. It pushes for a cleaner setup: use **ApplicationSets** instead of rolling your own YAML, turn on **auto-sync/self-heal..

Link
@faun shared a link, 2 weeks, 6 days ago

Introducing DigitalOcean Organizations, a new and comprehensive account layer

DigitalOcean just dropped **Organizations**—a real upgrade for anyone juggling multiple Teams. Think one top-level account to rule them all: centralized user control, one invoice to track, and org-wide settings for taxes, credits, and permissions...

Introducing DigitalOcean Organizations, a new and comprehensive account layer
Link
@faun shared a link, 2 weeks, 6 days ago

What are Error Budgets? A Guide to Managing Reliability

OneUptime shows how to put **error budgets** to work—keeping feature velocity in check without tanking reliability. The goal: ship fast, stay within SLOs. They do it by tracking **burn rates**, syncing across teams, and tuning SLOs to match how users actually use the product. Less guesswork, more s..

Link
@faun shared a link, 2 weeks, 6 days ago

v1.34: Pod Level Resources Graduated to Beta

Kubernetes v1.34 bumps **Pod Level Resources** to Beta—and flips them on by default. Now you can set CPU, memory, and hugepages limits for the whole Pod, not just per container. That means smoother scheduling, stricter resource caps, and less sidecar thrashing. **Why it matters:** This shifts Kuber..

Link
@faun shared a link, 2 weeks, 6 days ago

Why Rancher's Founders Pivoted From Kubernetes to Agentic AI

Obot.ai just dropped out of stealth with $35M in seed and a big swing: it’s building a control plane for agentic AI, anchored on the now-standard **Model Context Protocol (MCP)**. Its **MCP Gateway** handles registry, secure proxying, RBAC, and observability for MCP servers. Think API gateway, but ..

Why Rancher's Founders Pivoted From Kubernetes to Agentic AI
Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.