Join us

ContentUpdates and recent posts about Pelagia..
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

LLM Evaluation: Practical Tips at Booking.com

Booking.com built Judge-LLM, a framework where strong LLMs evaluate other models against a carefully curated golden dataset. Clear metric definitions, rigorous annotation, and iterative prompt engineering make evaluations more scalable and consistent than relying solely on humans. **The takeaway**:.. read more  

Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Understanding LLMs: Insights from Mechanistic Interpretability

LLMs generate text by predicting the next word using attention to capture context and MLP layers to store learned patterns. Mechanistic interpretability shows these models build circuits of attention and features, and tools like sparse autoencoders and attribution graphs help unpack superposition, r.. read more  

Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Guardians of the Agents 

A new static verification framework wants to make runtime safeguards look lazy. It slaps **mathematical safety proofs** onto LLM-generated workflows *before* they run—no more crossing fingers at execution time. The setup decouples **code from data**, then runs checks with tools like **CodeQL** and .. read more  

Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Scaling Prometheus: Managing 80M Metrics Smoothly

Flipkart ditched its creakyStatsD + InfluxDBstack for afederated Prometheussetup—built to handle 80M+ time-series metrics without choking. The move leaned intopull-based collection,PromQL's firepower, andhierarchical federationfor smarter aggregation and long-haul queries. Why it matters:Prometheus.. read more  

Scaling Prometheus: Managing 80M Metrics Smoothly
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Writing an operating system kernel from scratch

A barebonestime-sharing OS kernel, written inZig, running onRISC-V. It leans onOpenSBIfor console I/O and timer interrupts. Threads? Statically allocated, each running inuser mode (U-mode). The kernel stays insupervisor mode (S-mode), where it catchessystem callsandcontext switchesvia timer ticks. .. read more  

Writing an operating system kernel from scratch
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Accelerate serverless testing with LocalStack integration in VS Code IDE

The AWS Toolkit for VS Code now hooks straight into **LocalStack**. Run full end-to-end tests for **serverless workflows**—Lambda, SQS, EventBridge, the whole crew—without bouncing between tools or writing boilerplate. Just deploy to LocalStack from the IDE using the **AWS SAM CLI**. It feels like .. read more  

Accelerate serverless testing with LocalStack integration in VS Code IDE
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

PostgreSQL maintenance without superuser

PostgreSQL’s moving in on superusers. As of recent releases—starting way back in v9.6 and maturing through PostgreSQL 18 (coming 2025)—there are now **15+ built-in admin roles**. No need to hand out superuser just to get things done. These roles cover the ops spectrum: monitoring, backups, fil.. read more  

PostgreSQL maintenance without superuser
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Magical systems thinking

AI now writes over **25% of Google’s** and as much as **90% of Anthropic’s** code. That’s not a trend—it’s a regime change. Still, the mess in large public systems reminds us: clever analysis isn’t enough. Complex systems don’t behave; they misbehave. When the machines are churning out code, the .. read more  

Magical systems thinking
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

Introducing Budget Controls for AWS: Automatically Manage Your Cloud Costs

**Budget Controls for AWS** just got better. The open-source tool now reins in more than just EC2. It wrangles **RDS Aurora**, **SageMaker**, and **OpenSearch** too. Under the hood, it taps **AWS Budgets**, **AWS Config**, and **custom tags** to watch spend like a hawk. Hit a budget threshold? It c.. read more  

Introducing Budget Controls for AWS: Automatically Manage Your Cloud Costs
Link
@faun shared a link, 8 months, 3 weeks ago
FAUN.dev()

SLI Evolution Stages

A new SLI evolution model lays out a maturity roadmap—from rebranded latency/error metrics to ones that actually track business impact. It replaces shallow signals and pulls in the stuff that matters: how service failures hit user goals, tasks, and bottom lines... read more  

SLI Evolution Stages
Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.