vLLM - vLLM is a high-performance open-source inference a

Join us

Content

Updates and recent posts about vLLM..

Posts
Description

Link

@kaptain shared a link, 2 weeks, 4 days ago

FAUN.dev()

Migrating from Slurm to Kubernetes

SkyPilot drops a clean interface that blendsSlurmwithKubernetes. AI/ML teams get to keep their Slurm-style comforts - job scripts, gang scheduling, GPU guarantees, interactive workflows - but pick up Kubernetes perks like container isolation and rich ecosystem hooks. It handles the messy bits: pods,.. read more

Migrating from Slurm to Kubernetes

Link

@kaptain shared a link, 2 weeks, 4 days ago

FAUN.dev()

Zero-Downtime Ingress Controller Migration in Kubernetes

Ingress-nginxis heading for the exits - end-of-life drops March 2026. That puts Kubernetes operators on the hook to swap in a new ingress controller. The migration path? Run both old and new in parallel. Use DNS cutover. Point explicitly with Ingress classes. Done right, the switchover hits zero dow.. read more

Zero-Downtime Ingress Controller Migration in Kubernetes

Link

@kala shared a link, 2 weeks, 4 days ago

FAUN.dev()

YOLO Mode: Hidden Risks in Claude Code Permissions

A scrape of 18,470 Claude Code configs on GitHub shows a pattern: developers are handing their AI agents the keys to the castle. Unrestricted file, shell, and network accessis common. Among them: - 21.3% let Claude runcurl - 14.5% allowarbitrary Python execution - 19.7% give itgit pushprivileges Tha.. read more

YOLO Mode: Hidden Risks in Claude Code Permissions

Link

@kala shared a link, 2 weeks, 4 days ago

FAUN.dev()

GPT-5.2 derives a new result in theoretical physics

GPT-5.2 Pro spotted something wild: a nonzero gluon scattering amplitude in the half-collinear regime. That’s supposed to vanish, according to standard QFT gospel. Not anymore. OpenAI’s own model backed it up with a formal proof. Humans triple-checked it analytically. And yep - it holds. Now it’s bl.. read more

GPT-5.2 derives a new result in theoretical physics

Link

@kala shared a link, 2 weeks, 4 days ago

FAUN.dev()

Why Trying to Secure OpenClaw is Ridiculous

OpenClaw, an open-source autonomous AI agent with full device access, racked up 179K GitHub stars - and walked straight into a security nightmare. It shipped wide open: default ports exposed to the internet, its plugin hub laced with malicious packages. Slapped-on fixes followed, warning labels, Vir.. read more

Why Trying to Secure OpenClaw is Ridiculous

Link

@kala shared a link, 2 weeks, 4 days ago

FAUN.dev()

Adventures in Neural Rendering

A graphics dev took a swing at encoding rendering signals - radiance, irradiance, depth, AO, BRDFs - using tightMLPs in HLSL. They benchmarked size, storage, and runtime cost. Turns out, MLPs beatL2 spherical harmonicsfor packing radiance. But they stumble on irradiance and specular BRDFs. Bring inR.. read more

Adventures in Neural Rendering

Link

@kala shared a link, 2 weeks, 4 days ago

FAUN.dev()

Building a TUI is easy now

Hatchet usedClaude Code, a terminal-native coding agent, to build and ship a real TUI-based workflow manager - fast. Like, days-fast. Powered by theCharm stack(Bubble Tea, Lip Gloss, Huh), it leans hard into CLI-heavy development. Claude Code handled live testing intmux, whipped up frontend views fr.. read more

Building a TUI is easy now

Link

@devopslinks shared a link, 2 weeks, 4 days ago

FAUN.dev()

The future of software engineering is SRE

Agentic coding and no-code tools are everywhere now. Building features? Easier than ever. The harder part is keeping systems solid once they’re out in the wild. The real game:maintainability, reliability, and evolutionunder real pressure - not just building, but keeping it together over time... read more

The future of software engineering is SRE

Link

@devopslinks shared a link, 2 weeks, 4 days ago

FAUN.dev()

Owning a $5M data center

Comma.ai just dropped the specs on its hand-rolled ML data center. Picture this: 600 homegrown GPU rigs (TinyBox Pros), 4PB of flash. The whole thing trains on a PyTorch stack they built themselves, wired up with a custom model tracker and job scheduler they namedMiniray. Inference runs through dyna.. read more

Owning a $5M data center

Link

@devopslinks shared a link, 2 weeks, 4 days ago

FAUN.dev()

GitHub Actions Is Slowly Killing Your Engineering Team

A seasoned CI engineer lays into GitHub Actions - too fragile, too fuzzy, too slow. Logs glitch. YAML confuses. Compute chokes. It solves for convenience, not power. Buildkitesteps in with stronger bones: reproducible runs, clean orchestration, and scalable agents you control... read more

vLLM is an advanced open-source framework for serving and running large language models efficiently at scale. Developed by researchers and engineers from UC Berkeley and adopted widely across the AI industry, vLLM focuses on optimizing inference performance through its innovative PagedAttention mechanism — a memory management system that enables near-zero waste in GPU memory utilization. It supports model parallelism, continuous batching, tensor parallelism, and dynamic batching across GPUs, making it ideal for real-world deployment of foundation models. vLLM integrates seamlessly with Hugging Face Transformers, OpenAI-compatible APIs, and popular orchestration tools like Ray Serve and Kubernetes. Its design allows developers and enterprises to host LLMs with reduced latency, lower hardware costs, and increased throughput, powering everything from chatbots to enterprise-scale AI services.