NVIDIA shows how to fine-tune Nemotron-Nano-9B-V2 to handle new CLI tools - without touching real user data. The trick? A mix of synthetic data, reinforcement learning with verifiable rewards (RLVR), and their home-grown trainer stack: NeMo Gym plus GRPO.
The result: an LLM agent that adapts fast, plays nice with tools like LangGraph, and only runs commands it can prove are safe, with humans in the loop if needed.










