Join us

ContentUpdates and recent posts about Slurm..
 Activity
@jeffmoore64pub started using tool Visual Studio Code , 1 month, 1 week ago.
 Activity
@jeffmoore64pub started using tool ChatGPT , 1 month, 1 week ago.
 Activity
@jeffmoore64pub started using tool Azure , 1 month, 1 week ago.
 Activity
@sahil started using tool Kubernetes , 1 month, 1 week ago.
 Activity
@joey started using tool React , 1 month, 1 week ago.
 Activity
@joey started using tool Node.js , 1 month, 1 week ago.
 Activity
@joey started using tool Kubernetes , 1 month, 1 week ago.
Story
@laura_garcia shared a post, 1 month, 1 week ago
Software Developer, RELIANOID

🚨 Cyberattack on Qantas exposed growing threats to aviation

A few months ago, up to 6 million customers were affected through a third-party data breach — reportedly linked to Scattered Spider, a group notorious for social engineering and supply chain attacks. 🔍 The takeaway? The weakest link often lies outside the organization. ✈️ At RELIANOID, we helped air..

News FAUN.dev() Team Trending
@devopslinks shared an update, 1 month, 1 week ago
FAUN.dev()

AWS Outage: A Single Cloud Region Shouldn’t Take Down the World. But It Did.

Amazon Web Services

A major AWS outage disrupted high-profile services like Amazon, Snapchat, and Disney+, affecting over 70 AWS services and causing widespread operational issues.

Downdetector
Link
@varbear shared a link, 1 month, 1 week ago
FAUN.dev()

GitHub MCP Registry: The fastest way to discover AI tools

GitHub rolled out theMCP Registry—a hub for findingModel Context Protocol (MCP) serverswithout hunting through scattered corners of the internet. No more siloed lists or mystery URLs. It's all in one place now. The goal? Cleaner access to AI agent tools, plus a path towardself-publishing, thanks to .. read more  

GitHub MCP Registry: The fastest way to discover AI tools
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.