Join us

ContentUpdates and recent posts about Slurm..
Link
@devopslinks shared a link, 6 months ago
FAUN.dev()

S3 Storage Classes: Fast Access

A cost deep-dive breaks down three AWS S3 storage classes -Standard,Standard-IA, andGlacier Instant Retrieval- with sharp, interactive visualizations. It maps out the tradeoffs: storage cost, access frequency, and early deletion pain. Key tipping points surface: - UseStandard-IAif you read the objec.. read more  

S3 Storage Classes: Fast Access
Link
@devopslinks shared a link, 6 months ago
FAUN.dev()

A complete guide to HTTP caching

A fresh guide reframes HTTP caching as less of a tweak, more of an architectural move. It breaks caching into layers - browser memory, CDNs, reverse proxies, app stores - and shows how each one plays a part (or gets in the way). It gets granular with headers likeCache-Control,ETag, andVary, calling .. read more  

A complete guide to HTTP caching
Link
@devopslinks shared a link, 6 months ago
FAUN.dev()

WTF is ... - AI-Native SAST?

AI-native SAST is replacing the “LLM as magic scanner” myth. Instead, the smart play is combining language models with real static analysis. That’s how teams are catching the gnarlier stuff - like business logic bugs - that usually slip through. The trick?Use static analysis to grab clean, relevant .. read more  

Link
@devopslinks shared a link, 6 months ago
FAUN.dev()

Unlocking self-service LLM deployment with platform engineering

A new platform stack - Port+GitHub Actions+HCP Terraform** - is turning LLM deployment into a clean self-service flow. The result => predictable, governed pipelines that ship faster. Infra gets standardized. Provisioning? Handled through GitHub Actions. Policies? Baked in via HCP Terraform. Port tie.. read more  

Unlocking self-service LLM deployment with platform engineering
Link
@devopslinks shared a link, 6 months ago
FAUN.dev()

Post-quantum (ML-DSA) code signing with AWS Private CA and AWS KMS

AWS Private CA now supportspost-quantum ML-DSA X.509 certificates. That means quantum-resistant roots of trust - for code signing, mTLS, and device auth. It's wired up with AWS KMS, so you can handle signing workflows usingML-DSA keysand verify them with standard tools like OpenSSL usingCMS detached.. read more  

Post-quantum (ML-DSA) code signing with AWS Private CA and AWS KMS
Link
@devopslinks shared a link, 6 months ago
FAUN.dev()

Terraform Stacks: A Deep-Dive for Azure Practitioners in Europe

Terraform Stacksjust hit GA onHCP Terraform, and they bring some real structure to the chaos. Think modular, declarative, and way less workspace spaghetti. Build reusablecomponents(a.k.a. modules), bundle them intodeployments, and wire up stacks usingpublish/consume patterns- complete with automated.. read more  

Terraform Stacks: A Deep-Dive for Azure Practitioners in Europe
News FAUN.dev() Team
@varbear shared an update, 6 months ago
FAUN.dev()

New MCP Release v0.10.0 Supercharges AI-Assisted Web Development

chrome-devtools-mcp

Chrome DevTools MCP v0.10.0 unlocks deeper AI-powered debugging with new tools for DOM access, network request detection, page reload automation, performance insights, and snapshot saving.

Google Launches Chrome DevTools MCP Server Preview for AI-Driven Web Debugging
News FAUN.dev() Team Trending
@varbear shared an update, 6 months ago
FAUN.dev()

AWS Lambda Gets Python 3.14: Faster, Smarter, and More Serverless-Friendly

AWS Lambda

Python 3.14 is now available in AWS Lambda, enabling developers to leverage new Python features for serverless applications.

AWS Lambda Gets Python 3.14: Faster, Smarter, and More Serverless-Friendly
News FAUN.dev() Team
@kaptain shared an update, 6 months ago
FAUN.dev()

The Most Absurd (and Brilliant) Kubernetes Cluster at KubeCon 2025

Kubernetes Talos Linux

Engineer Justin Garrison showcased a backpack-sized PETAFLOP Kubernetes cluster at KubeCon 2025, demonstrating localized AI capabilities without cloud reliance.

The Most Absurd (and Brilliant) Kubernetes Cluster at KubeCon 2025
News FAUN.dev() Team
@kaptain shared an update, 6 months ago
FAUN.dev()

Google Breaks Kubernetes Limits Again: Inside the 130,000-Node GKE Cluster

Google Kubernetes Engine (GKE) kueue

Google successfully operates a 130,000-node Kubernetes cluster to enhance GKE's scalability for AI workloads.

Control plane throughput: Sustaining up to 1,000 operations per second for both Pod creation and Pod binding during intense scheduling phases.
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.