Join us

ContentUpdates and recent posts about Unsloth..
Link
@kaptain shared a link, 1 month, 1 week ago
FAUN.dev()

How Cloud Native Infrastructure Powers AI on Kubernetes

A vendor piece from Mirantis arguing that GPU multi-tenancy on Kubernetes is widely misrepresented, with most platforms shipping namespace-based isolation while production GPU clouds require hardware-enforced separation through MIG partitioning, cluster-per-tenant architecture, and DPU-based network.. read more  

How Cloud Native Infrastructure Powers AI on Kubernetes
Link
@kaptain shared a link, 1 month, 1 week ago
FAUN.dev()

v1.36: Moving Volume Group Snapshots to GA

Volume group snapshots reachedGAin Kubernetesv1.36, with the API promoted togroupsnapshot.storage.k8s.io/v1. The feature lets aVolumeGroupSnapshotobject take crash-consistent snapshots across multiple PVCs selected by label, removing the need to quiesce applications that span separate data and log v.. read more  

Link
@kaptain shared a link, 1 month, 1 week ago
FAUN.dev()

v1.36: Declarative Validation Graduates to GA

Declarative validation graduated toGAin Kubernetesv1.36, replacing handwritten Go validation with+k8s:marker tags on field definitions... read more  

Link
@kaptain shared a link, 1 month, 1 week ago
FAUN.dev()

v1.36: Server-Side Sharded List and Watch

Alpha inv1.36, server-side sharded list and watch adds ashardSelectorfield toListOptionsso the API server uses an FNV-1a hash onmetadata.uidormetadata.namespaceto send each controller replica only its slice of the resource collection. This eliminates the cost of every replica deserializing the full .. read more  

Link
@kala shared a link, 1 month, 1 week ago
FAUN.dev()

Orchestrating AI Code Review at scale

Cloudflare engineers built an AI code review platform on OpenCode. They split GitLab integration, model providers, prompts, and policy into separate plugins. A coordinator assigns up to seven domain reviewers across security, performance, code quality, documentation, release checks, and AGENTS.md co.. read more  

Orchestrating AI Code Review at scale
Link
@kala shared a link, 1 month, 1 week ago
FAUN.dev()

How We Built an AI Second Brain for 60K Knowledge Workers

Meta built an AI agent system internally called the AI Second Brain that now has over 63,000 installs and ~10,000 daily active users across engineering, PM, design, legal, finance, comms, and sales, growing from zero in roughly three months after a non-technical PM's adoption post. The architecture .. read more  

How We Built an AI Second Brain for 60K Knowledge Workers
Link
@kala shared a link, 1 month, 1 week ago
FAUN.dev()

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Netflix's Saish Sali, Nipun Kumar, and Sura Elamurugu describe the Metadata Service (MDS), a graph layer built to connect siloed ML tooling (model registry, pipeline orchestrator, experimentation platform, feature store, dataset platform, identity) across personalization, studio, payments, and ads. .. read more  

Link
@kala shared a link, 1 month, 1 week ago
FAUN.dev()

The AWS MCP Server is now generally available

AWS now offers AWS MCP Server as a managed remote MCP server in US East (N. Virginia) and Europe (Frankfurt). MCP-compatible clients can use existing IAM credentials to access more than 15,000 AWS API operations. For GA, AWS added IAM context keys, documentation retrieval without authentication, low.. read more  

The AWS MCP Server is now generally available
Link
@kala shared a link, 1 month, 1 week ago
FAUN.dev()

Running local models on an M4 with 24GB memory

Local LLMs work best as supervised coding assistants. The writer ran Qwen 3.5 9B (Q4) in LM Studio on a 24GB MacBook Pro and got about 40 tokens per second, with thinking mode, tool use, and a 128K context window. The author saw mixed results: Qwen helped with simple Elixir linter edits, then failed.. read more  

Running local models on an M4 with 24GB memory
Link
@devopslinks shared a link, 1 month, 1 week ago
FAUN.dev()

S3 Files and the changing face of S3

AWS launchedS3 Files, an EFS-backed feature that mounts any S3 bucket or prefix as an NFS filesystem on EC2, containers, or Lambda, with changes batched back to S3 roughly every 60 seconds. Rather than collapsing file and object semantics into a single model (an early design attempt called "EFS3" th.. read more  

S3 Files and the changing face of S3
Unsloth is an open-source toolkit for training and fine-tuning large language models faster and with less memory than a standard Hugging Face stack. Its core library replaces PyTorch's default autograd with custom backpropagation kernels written in OpenAI's Triton language, which is where most of its speed and memory savings come from. It supports LoRA, QLoRA, full fine-tuning, reinforcement learning, pretraining, and 4-bit, 16-bit, and FP8 training, across more than 500 text, vision, audio, and embedding models.

The practical draw is hardware reach. QLoRA workflows in Unsloth let you fine-tune an 8B model on a single 12 GB consumer GPU, and the project headlines roughly 2x faster training with about 70 percent less VRAM versus baseline implementations, though the exact figures vary by model, GPU, and config. A 2026 update added faster mixture-of-experts training, with models like Qwen3-30B-A3B fine-tunable on about 17.5 GB of VRAM. It runs on NVIDIA (including Blackwell and DGX Spark), AMD, and Intel GPUs, with free Colab and Kaggle notebooks for trying it without local hardware.

It fits cleanly into the local-AI workflow. Unsloth integrates with Hugging Face transformers and TRL, and uses llama.cpp to save and run models, exporting to GGUF for Ollama or LM Studio as well as safetensors. As of 2026 it also ships Unsloth Studio, a local no-code GUI that covers the full lifecycle from dataset creation to training to running and comparing GGUF and safetensors models, with tool-calling, web search, and an OpenAI-compatible API, all running offline on Mac and Windows, with the core library under the Apache 2.0 license.