Join us

ContentUpdates and recent posts about Unsloth..
Link
@varbear shared a link, 1 month, 2 weeks ago
FAUN.dev()

How To Make a Fast Dynamic Language Interpreter

Zef's AST-walking interpreter posts a 16.6× speed-up. The gains come from surgical changes:64-bit tagged values,AST node & RMW specialization,symbol hash-consing,inline caches, and a shapedobject model. Developers built it onFil-C++and later ported it toYolo-C++. The Yolo build adds ~4x speed, at th.. read more  

Link
@varbear shared a link, 1 month, 2 weeks ago
FAUN.dev()

How We Reduced Median Memory Estimation Error by 99%, With the Help of AI

The compaction pipeline at Mixpanel ran into memory estimation issues causing OOMKills. By implementing AI-assisted analysis, they were able to reduce median estimation errorby 99%, leading to a significant improvement in memory estimation accuracy. Through thorough analysis and exploration of alter.. read more  

How We Reduced Median Memory Estimation Error by 99%, With the Help of AI
Link
@kaptain shared a link, 1 month, 2 weeks ago
FAUN.dev()

v1.36: In-Place Vertical Scaling for Pod-Level Resources Graduates to Beta

Kubernetes v1.36 moves In-Place Pod-Level Resources Vertical Scaling to Beta and flips the feature gate on by default. Operators can patch a Pod's aggregate resource to resize running Pods. Often no container restart is needed. Kubelet breaks the Pod-level change into per-container resize events. It.. read more  

Link
@kaptain shared a link, 1 month, 2 weeks ago
FAUN.dev()

From Ingress NGINX to Higress: migrating 60+ resources in 30 minutes with AI

With the March 2026 retirement ofIngress NGINX, teams face an urgent compliance mandate. They must replace unpatched controllers. EnterHigress. Built onEnvoyandIstio. It unifies LLM protocols, enforces token rate limits, caches prompts, hostsMCP, and usesxDSfor zero-downtime. AnAI agentpaired withhi.. read more  

From Ingress NGINX to Higress: migrating 60+ resources in 30 minutes with AI
Link
@kaptain shared a link, 1 month, 2 weeks ago
FAUN.dev()

v1.36: Tiered Memory Protection with Memory QoS

Kubernetes v1.36 rolls out Memory QoS (alpha). Opt-inmemory reservation. Tiered protection by QoS class. Kubelet observability metrics. Kernel-version warnings. It separatesthrottlingfromreservation. A feature gate enables throttling. A kubelet config field controls tieredcgroup v2protection:Guarant.. read more  

Link
@kaptain shared a link, 1 month, 2 weeks ago
FAUN.dev()

Auto-Diagnosing Kubernetes Alerts with HolmesGPT and CNCF Tools

STCLab built an AI investigation pipeline withHolmesGPT, a 200-linePythonplaybook, andOpenTelemetry. It streamedMimir,Loki, andTempointo Slack threads. Metadata-driven markdownrunbookslimited tools per namespace, cut wasted tool calls from 16 to 2, and let the same model resolve alerts faster... read more  

Auto-Diagnosing Kubernetes Alerts with HolmesGPT and CNCF Tools
Link
@kaptain shared a link, 1 month, 2 weeks ago
FAUN.dev()

v1.36: Staleness Mitigation and Observability for Controllers

Kubernetes v1.36 shipsclient-goatomicFIFOprocessing and cache-introspection APIs. Controllers detect stale informer state and skip acting on it. kube-controller-managerenables the capability by default for four high-contention pod controllers. It addsalpha metricsfor skipped syncs and informer resou.. read more  

Link
@kala shared a link, 1 month, 2 weeks ago
FAUN.dev()

Monitoring LLM behavior: Drift, retries, and refusal patterns

Traditional software is predictable due to determinism, while generative AI is unpredictable. Engineers need a new infrastructure layer, the AI Evaluation Stack, to ship enterprise-ready AI products. The stack includes deterministic assertions and model-based assertions to ensure structural integrit.. read more  

Link
@kala shared a link, 1 month, 2 weeks ago
FAUN.dev()

An open-weights Chinese model just beat Claude, GPT-5.5, and Gemini in a programming challenge

The AI Coding Contest Day 12 matched ten models on a sliding‑letter puzzle. Open‑weightsKimi K2.6took first: 22 match points (7‑1‑0).MiMo V2‑Proscored second by blasting claims for intact ≥7‑letter seeds (43 points).GPT‑5.5andClaude Opus 4.7landed third and fifth. Grids ran10×10→30×30. Heavy scrambl.. read more  

An open-weights Chinese model just beat Claude, GPT-5.5, and Gemini in a programming challenge
Link
@kala shared a link, 1 month, 2 weeks ago
FAUN.dev()

Introducing the Agent Readiness score. Check to see if your site is agent-ready

Cloudflare launchedIsItAgentReady. It scans200kdomains, scoresagent readiness, publishes weekly adoption charts, and exposes results via anAPI. It checksrobots.txt,llms.txt, content negotiation viaAccept: text/markdown,API Catalog,.well-known/mcp.json, OAuth discovery, andx402payments. Cloudflare ov.. read more  

Introducing the Agent Readiness score. Check to see if your site is agent-ready
Unsloth is an open-source toolkit for training and fine-tuning large language models faster and with less memory than a standard Hugging Face stack. Its core library replaces PyTorch's default autograd with custom backpropagation kernels written in OpenAI's Triton language, which is where most of its speed and memory savings come from. It supports LoRA, QLoRA, full fine-tuning, reinforcement learning, pretraining, and 4-bit, 16-bit, and FP8 training, across more than 500 text, vision, audio, and embedding models.

The practical draw is hardware reach. QLoRA workflows in Unsloth let you fine-tune an 8B model on a single 12 GB consumer GPU, and the project headlines roughly 2x faster training with about 70 percent less VRAM versus baseline implementations, though the exact figures vary by model, GPU, and config. A 2026 update added faster mixture-of-experts training, with models like Qwen3-30B-A3B fine-tunable on about 17.5 GB of VRAM. It runs on NVIDIA (including Blackwell and DGX Spark), AMD, and Intel GPUs, with free Colab and Kaggle notebooks for trying it without local hardware.

It fits cleanly into the local-AI workflow. Unsloth integrates with Hugging Face transformers and TRL, and uses llama.cpp to save and run models, exporting to GGUF for Ollama or LM Studio as well as safetensors. As of 2026 it also ships Unsloth Studio, a local no-code GUI that covers the full lifecycle from dataset creation to training to running and comparing GGUF and safetensors models, with tool-calling, web search, and an OpenAI-compatible API, all running offline on Mac and Windows, with the core library under the Apache 2.0 license.