At Google Cloud, the llm-d project has been accepted as a Cloud Native Computing Foundation (CNCF) Sandbox project. This collaboration with industry leaders like Red Hat, IBM Research, CoreWeave, and NVIDIA aims to provide a framework for any model, accelerator, or cloud. The introduction of GKE Inference Gateway and Kubernetes LeaderWorkerSet (LWS) API, along with the integration of vLLM for Cloud TPUs, are enhancing the infrastructure for AI serving at scale.










