Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

@faun ・ Jul 11,2024

https://netflixtechblog.com/enhancing-netflix-reliability-wi...

Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

Without prioritized load-shedding, both user-initiated and prefetch availability drop under latency spikes, but with it, user-initiated requests maintain 100% availability while only prefetch requests are throttled. During an infrastructure outage, prioritized load-shedding kept user-initiated requests’ availability above 99.4% despite a 12x spike in prefetch requests. Netflix has created an internal library enabling services to perform prioritized load shedding based on CPU utilization and predefined priority buckets, ensuring consistent user experience during load spikes.