Join us

ContentUpdates and recent posts about kueue..
News FAUN.dev() Team
@kaptain shared an update, an hour ago
FAUN.dev()

Google Breaks Kubernetes Limits Again: Inside the 130,000-Node GKE Cluster

Google Kubernetes Engine (GKE) kueue

Google successfully operates a 130,000-node Kubernetes cluster to enhance GKE's scalability for AI workloads.

Control plane throughput: Sustaining up to 1,000 operations per second for both Pod creation and Pod binding during intense scheduling phases.
 Activity
@kaptain added a new tool kueue , 1 hour, 38 minutes ago.
Kueue is a Kubernetes-native job queueing and workload management system designed for large-scale, mixed compute environments such as AI/ML training, batch workloads, and HPC workflows. Instead of scheduling individual Pods, Kueue operates at the job level, deciding when a job should run based on resource quotas, fair-sharing policies, cluster availability, and workload priorities.

Kueue integrates tightly with Kubernetes, working alongside the default scheduler rather than replacing it. It provides features such as all-or-nothing (gang) admission, workload preemption, quota-based sharing across teams or tenants, and support for advanced frameworks like JobSet and Ray. Its goal is to help Kubernetes clusters run efficiently under heavy load while ensuring that critical, latency-sensitive, or large training jobs receive the resources they need without starving lower-priority workloads.