Join us

ContentUpdates and recent posts about Slurm..
Link
@devopslinks shared a link, 2 months, 2 weeks ago
FAUN.dev()

LLMs Are Good at SQL. We Gave Ours Terabytes of CI Logs.

Mendral's agent runs ad‑hocSQLagainst compressedClickHouselogs. It traces flaky tests across months and scans up to 4.3B rows per investigation. They denormalize 48 metadata columns per log line. They compress 5.31 TiB down to ~154 GiB (~21 bytes/line) — a 35:1 ratio. That turns arbitrary filters in.. read more  

LLMs Are Good at SQL. We Gave Ours Terabytes of CI Logs.
Link
@devopslinks shared a link, 2 months, 2 weeks ago
FAUN.dev()

Rendering 100M pixels a second over ssh

A massively multiplayer snake game accessible over ssh, capable of handling thousands of concurrent players and rendering over a hundred million pixels a second. The game utilizes bubbletea for rendering frames and custom techniques to reduce bandwidth usage to around 2.5 KB/sec. Performance improve.. read more  

Rendering 100M pixels a second over ssh
Link
@devopslinks shared a link, 2 months, 2 weeks ago
FAUN.dev()

Google API Keys Weren't Secrets. But then Gemini Changed the Rules

A report reveals Google Cloud'sAPI keysuse the same format for public IDs and secret auth. That overlap lets public keys reach theGemini API. New keys default toUnrestricted. Existing keys can be retroactively granted Gemini access. Google will add scoped defaults, block leaked keys, and notify affe.. read more  

Google API Keys Weren't Secrets. But then Gemini Changed the Rules
Link
@devopslinks shared a link, 2 months, 2 weeks ago
FAUN.dev()

How to scale GitOps in the enterprise: From single cluster to fleet management

In GitOps, the "Argo Ceiling" is the point where tooling that worked at a small scale becomes unmanageable as you scale up to multiple clusters. To address this, you can consider using OCI registries and ConfigHub as alternative state store options. When it comes to secrets management, options like .. read more  

How to scale GitOps in the enterprise: From single cluster to fleet management
Link
@varbear shared a link, 2 months, 2 weeks ago
FAUN.dev()

I Taught My Dog to Vibe Code Games

DogKeyboardruns onRaspberry Pi. It filters Bluetooth keystrokes, proxies them toClaude Code, and triggers a feeder overZigbee. Builds useGodot 4.6andC#. Automated screenshot/replay testers, a scene linter, a shader linter, and an input mapper letClaude Codeauto-test, patch, and relaunch games... read more  

I Taught My Dog to Vibe Code Games
Link
@varbear shared a link, 2 months, 2 weeks ago
FAUN.dev()

How we reduced the size of our Agent Go binaries by up to 77%

The Datadog Agent cut its Go binaries size by up to 77% in six months, removing unnecessary dependencies and enabling linker optimizations to trim artifacts significantly... read more  

Link
@varbear shared a link, 2 months, 2 weeks ago
FAUN.dev()

The best new features of C# 14

C# 14 ships with.NET 10. It addsfile-based apps. Run a single .cs file from the command line. No project or solution files. It also adds extension members and extension blocks. They bring extension properties, grouped receivers, and a cleaner extension syntax... read more  

The best new features of C# 14
Link
@varbear shared a link, 2 months, 2 weeks ago
FAUN.dev()

Malicious Next.js Repos Target Developers Via Fake Job Interviews

Linked to North Korean fake job-recruitment campaigns, the poisoned repositories are aimed at establishing persistent access to infected machines... read more  

Link
@varbear shared a link, 2 months, 2 weeks ago
FAUN.dev()

The Linux Foundation reveals the "ugly" secret of how open source is draining your budget

Linux Foundation report finds contributors get2x–5x ROI. It also finds45%of organizations runprivate forksthat cost ~5,000labor hours per release. The report introduces anROI modelthat values contributions bylabor cost, not lines‑of‑code. It simulates cross‑project tradeoffs... read more  

Story
@laura_garcia shared a post, 2 months, 2 weeks ago
Software Developer, RELIANOID

🍺 Cyberattack on Asahi Group: Why Japan’s Industrial Sector Can’t Afford to Wait

We’re resharing this post because its relevance has only grown. Japan’s largest brewer, Asahi Group, was recently hit by a major ransomware attack that disrupted production and logistics operations nationwide. The timing is striking: the incident came just days after Japan enacted its new Cyber Defe..

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.