Empowering a Globally Distributed Team for Faster Incident Resolution
In today’s digital landscape, ensuring seamless and reliable service delivery is paramount. This is especially true for companies like Klever, a leading cryptocurrency and financial services provider with a global user base. With a geographically dispersed team, Klever faced challenges in managing on-call rotations and ensuring timely responses to critical incidents.
This blog explores how Klever leveraged Squadcast, an on-call scheduling and alerting platform, to optimize their operations and enhance customer experience.
Challenges of Manual On-Call Rotations
Klever’s initial reliance on manual on-call scheduling, often managed through spreadsheets, proved cumbersome for their global workforce. Coordinating schedules across various time zones and ensuring the right personnel were notified during non-business hours presented significant roadblocks.
Squadcastto the Rescue: Streamlining On-Call Scheduling and Alerting
Squadcast’s automated on-call scheduling features revolutionized Klever’s approach. The platform simplified creating and managing on-call rotations, enabling effortless adjustments, overrides, and streamlined alert ownership — even during off-peak hours.
Reduced Response Times with Efficient Alert Routing
Prior to Squadcast, alerts from various sources like AWS CloudWatch, Google Stackdriver, and Prometheus Alertmanager flooded Slack channels. This, coupled with time zone disparities, led to delayed acknowledgements and sluggish response times.
Squadcast’s on-call notifications and escalation policies addressed this concern effectively. Alerts were routed to the designated on-call personnel on Slack, ensuring prompt action and minimizing both Mean Time To Acknowledge (MTTA) and Mean Time To Respond (MTTR).
Enhanced Visibility into Incident Management
Klever lacked a system to track the impact of incidents and their corresponding response metrics before adopting Squadcast. Squadcast’s Incident Dashboard and Analytics provided invaluable insights. The platform facilitated the monitoring of crucial metrics like MTTA, MTTR, and incident severity, allowing for a comprehensive analysis of incident impact on downstream and upstream services.
Combating Alert Fatigue with Intelligent Alert Suppression
The high volume of alerts, particularly during node outages, overwhelmed Klever’s team. Squadcast’s Suppression Rules empowered them to focus on critical issues by minimizing non-critical alerts, significantly reducing alert fatigue.
Improved Customer Communication through Service Status Pages
Previously, Klever engineers relied on their support team to communicate service statuses (downtime or degradation) to customers. Squadcast’s Status Page feature transformed this process. Engineers could directly update the Status Page, keeping customers informed and reducing dependency on the support team.
The Klever Advantage: A Summary of Key Benefits
- Rapid Incident Response: Squadcast’s efficient alert routing ensures faster acknowledgment and resolution of incidents.
- Simplified On-Call Scheduling: Customizable scheduling features make managing on-call rotations for geographically dispersed teams a breeze.
- Improved Service Visibility: Effective communication of service statuses to customers and stakeholders is facilitated by Squadcast’s Status Pages.
- Enhanced Analytics and Reporting: A centralized Incident Dashboard provides a single source of truth for analyzing incident data and conducting thorough post-mortems.
Squadcast: A Valuable Partner for Klever’s Success
Klever’s Site Reliability Engineer, Kadu Relvas Barral, credits Squadcast with streamlining on-call scheduling, facilitating rotation adjustments, and enabling instant alert routing to the designated on-call engineer.
Squadcast’s role in empowering Klever to achieve faster incident resolution, improved service visibility, and enhanced customer communication is undeniable. As Klever continues to scale its operations, Squadcast will undoubtedly remain a valuable partner in ensuring exceptional service delivery.
Read the entire Case Study here
Squadcast is an Incident Management tool that’s purpose-built for SRE. Get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. Work in collaboration using virtual incident war rooms and use automation to eliminate toil.