Updates and recent posts about Slurm..

Posts
Description

Story

@laura_garcia shared a post, 1 month, 2 weeks ago

Software Developer, RELIANOID

RELIANOID at CII Delhi International Technology Summit 2025

16–17 December 2025 - New Delhi, India Our team continues a packed December schedule, and we’re excited to add another key event: the CII Delhi International Technology Summit 2025. Focused on “Accelerating the Techade”, this summit brings together industry, government, and research leaders to shape..

CII Delhi International Technology Summit relianoid

Link

@anjali shared a link, 1 month, 2 weeks ago

Customer Marketing Manager, Last9

OTel Updates: OpenTelemetry Proposes Changes to Stability, Releases, and Semantic Conventions

OpenTelemetry proposes stability changes: stable-by-default distributions, decoupled instrumentation, and epoch releases for production deployments.

Story

@laura_garcia shared a post, 1 month, 2 weeks ago

Software Developer, RELIANOID

deploy the RELIANOID Load Balancer Community Edition v7 on Azure using Terraform

🚀 New Technical Guide Available! You can now deploy the RELIANOID Load Balancer Community Edition v7 on Azure using Terraform in just a few minutes: ✔️ Install prerequisites (Terraform, Azure CLI, SSH keys) ✔️ Use the official Terraform module from the Registry ✔️ Automatically provision all Azure r..

terraform_relianoid_community_azure_img2

Activity

@tairascott gave 🐾 to Helm 4 or Nelm? What's the difference , 1 month, 2 weeks ago.

Activity

@tairascott gave 🐾 to Hidden Correlations Traditional Monitoring Misses , 1 month, 2 weeks ago.

Activity

@tairascott gave 🐾 to How to Track Down the Real Cause of Sudden Latency Spikes , 1 month, 2 weeks ago.

Link

@anjali shared a link, 1 month, 3 weeks ago

Customer Marketing Manager, Last9

How to Track Down the Real Cause of Sudden Latency Spikes

Sudden latency spikes rarely have a single cause. This blog shows how to uncover the real source using traces, histograms, and modern debugging signals.

Link

@anjali shared a link, 1 month, 3 weeks ago

Customer Marketing Manager, Last9

Hidden Correlations Traditional Monitoring Misses

Last9 is built to work with high-cardinality telemetry, and we’ve been covering it in detail through our series. This piece looks at a familiar pain: issues that only show up for a specific tenant or deployment. Why does that context disappear in most monitoring setups?

Story Trending

@shurup shared a post, 1 month, 3 weeks ago

@palark

Helm 4 or Nelm? What's the difference

#Helm #Cloud N... #werf #Nelm #kuberne...

Helm 4.0.0 brought several new features to its users, such as Server-Side Apply support and kstatus-based resource watching.Nelm, an alternative to Helm created in werf, a CNCF Sandbox project, has been offering these capabilities even before. Nelm has many more new features for Kubernetes deploymen..

Link

@anjali shared a link, 1 month, 3 weeks ago

Customer Marketing Manager, Last9

Which Observability Tool Helps with Visibility Without Overspend

A detailed look at observability platforms so you can choose tools that keep visibility high and costs steady as your systems scale.

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.

FAUN.amplify()

👋 Developers trust FAUN.dev() to stay up to date. Sponsor us and put your product, service, or event in front of thousands of highly engaged developers.!

> Sponsor

FAUN.hbc() - Humans Behind Code

🧑‍💻 Are you developing a project? Join the "Humans Behind Code" project and showcase your work to the world!

> Apply