ContentPosts from @6172852785..
Story
@laura_garcia shared a post, 1 year, 4 months ago
Software Developer, RELIANOID

Netdev Recap

Just wrapped up an incredible Netdev 0x18! From cutting-edge innovations in Linux networking to insightful talks from industry leaders, this year’s event was packed with highlights. Curious about what went down? Check out our full recap article here!https://www.relianoid.com/blog/netdev-conference-0..

Netdev 0x18 Recap
Story
@squadcast shared a post, 1 year, 4 months ago

Observability: A Deep Dive into Tools, Best Practices, and Examples

Observability is a critical component of modern software development, providing insights into system performance, availability, and quality. The blog delves into the concept of observability, differentiating it from traditional monitoring.

Key points covered include:

Evolution of observability: From system-centric monitoring to service-focused observability in microservices architectures.

Three pillars of observability: Metrics, logs, and traces, their roles, and popular tools (Prometheus, ELK Stack, Jaeger).

Building a comprehensive observability strategy: Best practices like data centralization, quality, alerting, visualization, correlation, anomaly detection, and continuous improvement.

Challenges: Data volume, complexity, tooling, and skillset requirements.

Overall, the blog emphasizes the importance of observability for understanding system behavior, improving performance, and ensuring reliability.

Story
@squadcast shared a post, 1 year, 4 months ago

Conquering On-Call Challenges: A Guide and Best Practices for SRE Teams

The blog provides a comprehensive guide to effective on-call scheduling for SRE teams. It emphasizes the importance of on-call management for maintaining system reliability and preventing team burnout.

Key points include:

The role of on-call scheduling software in automating and optimizing the process.

Strategies for creating balanced and efficient on-call rotations, such as the "follow-the-sun" approach.

The importance of clear communication, documentation, and escalation plans.

The need for regular post-mortem meetings and SRE training.

Tips for fostering a supportive on-call culture.

Ultimately, the blog aims to help SRE teams implement best practices for on-call scheduling, leading to improved team morale, incident response, and overall system reliability.

Story
@adammetis shared a post, 1 year, 4 months ago
DevRel, Metis

The Importance of Being Agile in the Database World

This agility in managing database schema changes is key to maintaining speed and flexibility in our database strategies. But how can we move fast around databases? How can we be agile in the database world? Read on to see.

Being Agile in Database World@3x
Story
@squadcast shared a post, 1 year, 4 months ago

Runbook Automation: Achieving Faster Incident Recovery | Squadcast

ARun bookis a predefined set of steps or procedures that is usually executed manually by a systems engineer. For instance: say you want to upgrade an application on production, and you have a defined set of steps that are documented. We call this a runbook. It contains procedures to begin, stop, sup..

Story
@ketbostoganashvili shared a post, 1 year, 4 months ago
Technical Content Writer

How to Create an HTML Template That Email Clients Render Well

A developer can’t code an HTML email template using the same technologies and approaches as one would when building a web page. It may sound ridiculous, but it’s the truth. So, let’s try to figure out how valid this statement is.

How-to-Create-an-HTML-Template-That-Email-Clients-Render-Well-02-1040x540
Story
@laura_garcia shared a post, 1 year, 4 months ago
Software Developer, RELIANOID

Discover the key differences between Active-Active and Active-Standby failover strategies

Ensuring network resilience is critical for maintaining continuous business operations. Discover the key differences between Active-Active and Active-Standby failover strategies, their benefits, use cases, and implementation considerations. Learn how to choose the right approach to keep your network..

Knowledge base_Understanding Active-Active an Active-Standby Fail-over_RELIANOID
Story
@laura_garcia shared a post, 1 year, 4 months ago
Software Developer, RELIANOID

Understanding the CrowdStrike Outage

Understanding the CrowdStrike Outage: The Largest IT Disruption in History A recent software update from CrowdStrike caused an unprecedented global IT outage, disrupting millions of devices and affecting key sectors like airlines, healthcare, and emergency services. This incident highlights the crit..

Crowdstrike outage RELIANOID
Story
@squadcast shared a post, 1 year, 4 months ago

Automating On-Call Scheduling with On-Call Scheduling Software: A Comprehensive Guide

Automating On-Call Scheduling withOn-Call Scheduling Software

The blog discusses the challenges associated with managing on-call schedules manually, such as errors, time consumption, and inflexibility. It highlights the benefits of using on-call scheduling software to automate the process, including increased efficiency, improved communication, and enhanced visibility.

Key features of on-call scheduling software covered are recurring schedules, escalation policies, overrides, integrations, and analytics. The blog also provides guidance on selecting the right software based on factors like ease of use, customization, integrations, scalability, reliability, and cost.

Ultimately, the blog emphasizes the positive impact of automating on-call scheduling on team productivity, incident management, and overall organizational efficiency.

Story
@squadcast shared a post, 1 year, 4 months ago

Silencing the Siren: A Comprehensive Guide to Alert Noise Reduction

Silencing the Siren: A Comprehensive Guide toAlert Noise Reduction

This blog post addresses the issue of alert fatigue, which is a common problem for on-call engineers. It provides strategies to minimize the number of irrelevant alerts, allowing teams to focus on critical incidents.

The blog covers:

The negative impacts of alert noise

Optimizing monitoring systems for fewer false alerts

Leveraging on-call tools to manage alert volume effectively

Cultivating a culture of alert management

Advanced techniques for advanced alert noise reduction

Ultimately, the goal is to help readers create a more efficient and less stressful on-call environment.