Join us

ContentUpdates and recent posts about Slurm..
 Activity
@juanosma1012 started using tool Nginx , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool Kubernetes , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool Jenkins , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool GitLab CI/CD , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool Docker , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool Atlassian Bitbucket , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool Ansible , 1 week, 2 days ago.
 Activity
@juanosma1012 started using tool Amazon Web Services , 1 week, 2 days ago.
Story
@laura_garcia shared a post, 1 week, 3 days ago
Software Developer, RELIANOID

📢 RELIANOID is heading to Frankfurt!

We're excited to attend Next IT Security – C-Suites Edition: Redefining Cyber Resilience in DACH, taking place on November 27th, 2025. “The time is always right to do what is right.” – Martin Luther King Jr. This exclusive summit brings together top CISOs, CTOs, and cybersecurity leaders from across..

next IT security frankfurt relianoid
Link
@anjali shared a link, 1 week, 3 days ago
Customer Marketing Manager, Last9

Top 7 Observability Platforms That Auto-Discover Services

Auto-discovery tools now detect services as they appear and build dashboards instantly. Here are seven platforms that do it well.

grafana_tempo
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.