A fresh reference architecture built with Envoy AI Gateway and KServe brings order to the GenAI chaos. One clean interface to route requests across internal and external LLMs—locked down with policies.
It’s called a Two-Tier Gateway Architecture. Think of it like a split-brain: external API traffic goes one way, internal models another. Keeps things tidy while giving you centralized control, autoscaling, telemetry hooks, and room for custom rules.