Agent engineering with **small language models (SLMs)**—anywhere from 270M to 32B parameters—calls for a different playbook. Think tight prompts, offloaded logic, clean I/O, and systems that don’t fall apart when things go sideways. The newer stack—**GGUF** + **llama.cpp**—lets these agents run locally on CPUs or GPUs. No cloud, no latency tax. Just edge devices pulling their weight.