Join us

ContentUpdates and recent posts about Pelagia..
Link
@kaptain shared a link, 7 months ago
FAUN.dev()

7 Common Kubernetes Pitfalls (and How I Learned to Avoid Them)

Seven ways folks trip over Kubernetes - each more avoidable than the last. Top offenses: skippingresource requests/limits, forgettinghealth probes, trustingephemeral logsthat vanish when you need them. Reusing configs across dev and prod? Still a bad idea. Pushing off observability until it’s on fir.. read more  

Link
@kaptain shared a link, 7 months ago
FAUN.dev()

Replaying massive data in a non-production environment using Pekko Streams and Kubernetes Pekko Cluster

DoubleVerify built a traffic replay tool that actually scales. It runs onPekko StreamsandPekko Cluster, pumping real production-like traffic into non-prod setups. Throttlenails the RPS with precision for functional tests.Distributed datasyncs stressful loads across cluster nodes without breaking a s.. read more  

Replaying massive data in a non-production environment using Pekko Streams and Kubernetes Pekko Cluster
Link
@kaptain shared a link, 7 months ago
FAUN.dev()

How to manage EKS Pod Identities at scale using Argo CD and AWS ACK

AWS shows how to wire upArgo CDwithAWS Controllers for Kubernetes (ACK)to automateEKS Pod Identityfor IAM roles - GitOps-style. The catch? The Pod Identity API has a lag. So they bolt on apre-deployment validation jobto wait-and-confirm that the IAM role's actually bound before app pods come online... read more  

Link
@kaptain shared a link, 7 months ago
FAUN.dev()

Spotlight on Policy Working Group

The Kubernetes Policy Working Group got busy turning good intentions into real specs. They rolled out thePolicy Reports API, dropped best-practice docs worth reading, and helped steerValidatingAdmissionPolicyandMutatingAdmissionPolicytoward GA. Their work pulled inSIG Auth,SIG Security, and anyone e.. read more  

Link
@kala shared a link, 7 months ago
FAUN.dev()

Why open source may not survive the rise of generative AI

Generative AI is snapping the attribution chain thatcopyleft licenseslike theGNU GPLrely on. Without clear provenance, license terms get lost. Compliance? Forget it. The give-and-take that powersFOSSstops giving - or taking... read more  

Why open source may not survive the rise of generative AI
Link
@kala shared a link, 7 months ago
FAUN.dev()

I regret building this $3000 Pi AI cluster

A 10-node Raspberry Pi 5 cluster built with16GB CM5 Lite modulestopped out at325 Gflops- then got lapped by an $8K x86 Framework PC cluster running4x faster. On the bright side? The Pi setup edged out in energy efficiency when pushed to thermal limits. It came with160 GB total RAM, but that didn’t h.. read more  

I regret building this $3000 Pi AI cluster
Link
@kala shared a link, 7 months ago
FAUN.dev()

Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference

Amazon rolled out fine-tuning and distillation forVision LLMslike Nova Lite viaBedrockandSageMaker. Translation: better doc parsing—think messy tax forms, receipts, invoices. Developers get two tuning paths:PEFTor full fine-tune. Then choose how to ship:on-demand inference (ODI)orProvisioned Through.. read more  

Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference
Link
@kala shared a link, 7 months ago
FAUN.dev()

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning

Generative recommender systems need more than just observed user behavior to make accurate recommendations. Introducing A-SFT algorithm improves alignment between pre-trained models and reward models for more effective post-training... read more  

Link
@kala shared a link, 7 months ago
FAUN.dev()

What Significance Testing is, Why it matters, Various Types and Interpreting the p-Value

Significance testing determines if observed differences are meaningful by calculating the likelihood of results happening by chance. The p-value indicates this likelihood, with values below 0.05 suggesting statistical significance. Different tests, such as t-tests, ANOVA, and chi-square, help analyz.. read more  

Link
@devopslinks shared a link, 7 months ago
FAUN.dev()

A FinOps Guide to Comparing Containers and Serverless Functions for Compute

AWS dropped a new cost-performance playbook pittingAmazon ECSagainstAWS Lambda. It's not just a tech choice - it’s a workload strategy. Go containers when you’ve got steady traffic, high CPU or memory needs, or sticky app state. Go serverless for spiky, event-driven bursts that don’t need a long lea.. read more  

A FinOps Guide to Comparing Containers and Serverless Functions for Compute
Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.