Agent engineering with small language models (SLMs)—anywhere from 270M to 32B parameters—calls for a different playbook. Think tight prompts, offloaded logic, clean I/O, and systems that don’t fall apart when things go sideways. The newer stack—GGUF + llama.cpp—lets these agents run locally on CPUs or GPUs. No cloud, no latency tax. Just edge devices pulling their weight.