Join us

ContentUpdates and recent posts about Slurm..
 Activity
@ravikyada started using tool Amazon Web Services , 1 week, 1 day ago.
Link
@varbear shared a link, 1 week, 3 days ago
FAUN.dev()

Why are top university websites serving p0rn? It comes down to shoddy housekeeping.

Researcher Alex Shakhov found scammers commandeering staleCNAMErecords. They hijack university subdomains (eg.berkeley.edu,columbia.edu,washu.edu) and serve p0rn and scam pages. Shakhov found hundreds of abused subdomains across at least34universities. He counted thousands of hijacked pages indexed .. read more  

Why are top university websites serving p0rn? It comes down to shoddy housekeeping.
Link
@varbear shared a link, 1 week, 3 days ago
FAUN.dev()

PostgreSQL MVCC, Byte by Byte

PostgreSQL's MVCC stores two 32-bit XIDs per tuple -xminandxmax. The transaction snapshot decides visibility per tuple. Updates append new tuples and mark the old withxmax.VACUUMreclaims versions only when no active snapshot can see them. Long-runningREPEATABLE READsnapshots pin versions and cause b.. read more  

PostgreSQL MVCC, Byte by Byte
Link
@varbear shared a link, 1 week, 3 days ago
FAUN.dev()

The AWS Lambda 'Kiss of Death'

A Galera writer node froze afterInnoDBundo history ballooned. PooledAWS Lambdaconnections left transactions open and pinned MVCC read views. The team killed stalled sessions, enabledinnodb_undo_log_truncate, and cappedinnodb_max_undo_log_size. They also set sessiontransaction_isolation=READ-COMMITTE.. read more  

The AWS Lambda 'Kiss of Death'
Link
@varbear shared a link, 1 week, 3 days ago
FAUN.dev()

How The Heck Does Shazam Work? (An Interactive Exploration)

A phone captures audio and runs aFast Fourier Transform (FFT)on short windows. It builds aspectrogramand extractspeaks. Nearby peak pairs form compacthashes(two frequencies + time delta). Aninverted indexmaps those hashes to songs, and timing validates matches. Most services run lookups onserversaga.. read more  

How The Heck Does Shazam Work? (An Interactive Exploration)
Link
@varbear shared a link, 1 week, 3 days ago
FAUN.dev()

I Decompiled the White House's New App

A React Native app built withExpo SDK 54runsHermes. It talks to a WordPress REST backend and bundles a 5.5MB Hermes bytecode.Its WebView injects JavaScript to strip cookies, GDPR prompts, and paywall dialogs. The build includes OneSignal's fused-location pipeline, polling at 4.5 and 9.5 minutes and.. read more  

I Decompiled the White House's New App
Link
@kaptain shared a link, 1 week, 3 days ago
FAUN.dev()

From public static void main to Golden Kubestronaut: The Art of unlearning

The author left JVM monolith ops forKubernetes. They stacked certs:CKA,CKAD,CKS,KCNA,KCSA,CNCF Golden Kubestronaut. They treatPodsas the atomic deployable. They pick fights:IngressvsNodePort. They warn aboutConfigMapdrift. They spotlight runtime primitives:Horizontal Pod Autoscalerandservice meshfor.. read more  

From public static void main to Golden Kubestronaut: The Art of unlearning
Link
@kaptain shared a link, 1 week, 3 days ago
FAUN.dev()

Building a fault-tolerant metrics storage system at Airbnb

Airbnb built a metrics system that ingests50M samples/s, stores2.5PBof logical time series, and hosts1.3B active series. They use tenant-per-service grouping andshuffle sharding. They enforce per-tenant guardrails and a consolidatedcontrol plane. They shard queries and compaction. They run zone-awar.. read more  

Building a fault-tolerant metrics storage system at Airbnb
Link
@kaptain shared a link, 1 week, 3 days ago
FAUN.dev()

v1.36: User Namespaces in are finally GA

Kubernetesv1.36promotesUser Namespacesto GA on Linux. It brings rootless workload isolation. Kubelet leans on kernelID-mapped mounts. It sidesteps expensivechownby remappingUID/GIDat mount time and confines privileged processes. No more mass-chown screams... read more  

Link
@kaptain shared a link, 1 week, 3 days ago
FAUN.dev()

Why MicroVMs: The Architecture Behind Sandboxes

Docker Sandboxes puts each agent session in a dedicatedmicroVM. Each microVM runs a privateDocker daemoninside the VM boundary. That blocks access to the host. A new cross‑platformVMMruns on macOS, Windows, and Linux hypervisors. It slashes cold starts and runs fullDockerbuild, run, and compose work.. read more  

Why MicroVMs: The Architecture Behind Sandboxes
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.