ContentPosts from @aanaethan..
Story
@squadcast shared a post, 1 year, 6 months ago

Effective Incident Postmortems: Learn from Every Outage

This blog post explains what incident postmortems are and why they are important. It details the steps involved in conducting an effective incident postmortem, including creating a timeline, holding a meeting, and capturing key details. The importance of a blameless environment is emphasized. The blog post concludes by recommending resources for further reading on the topic.

Story
@squadcast shared a post, 1 year, 6 months ago

The Vital Role of SRE Observability in Ensuring System Reliability

This blog post explains the importance of SRE observability for building reliable systems. Observability, unlike traditional monitoring, goes beyond just checking if something is wrong. It allows SREs to understand what's happening inside a system by looking at its external outputs like metrics, traces, and logs. This data is crucial for troubleshooting, maintaining, and developing scalable systems.

The blog post also highlights the benefits of SRE observability for businesses. By understanding user satisfaction through SLOs (Service Level Objectives), businesses can make better decisions about feature development and resource allocation. Additionally, observability tools can reduce the workload for engineers by automating tasks and providing better insights into system behavior. Overall, SRE observability is essential for ensuring system reliability and business success.

Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

Cloud Run and Cloud Storage…now a perfect match

This article describes the recent feature enhancement to Cloud Run allowing Cloud Storage bucket to be mounted as a Container volume. With the introduction of Cloud Storage mounts in Cloud Run, you can now mount Cloud Storage buckets as volumes within Cloud Run containers without utilizing additiona.. read more  

Cloud Run and Cloud Storage…now a perfect match
Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

How Ahrefs gets a Billion dollar-worth infrastructure with a 90% discount

AWS OnDemand vs AWS Reserved Instances: The infrastructure costs can skyrocket with AWS OnDemand, while switching to a serverless architecture can cut costs significantly. The potential for cost savings with AWS serverless setups is clear. It's important to carefully consider all options to optimize.. read more  

How Ahrefs gets a Billion dollar-worth infrastructure with a 90% discount
Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

Distributed Tracing for Distributed System: Save Your Time & Company

Nowadays, one should absolutely respect these rules: 1) Building a microservice distributed system without proper monitoring/observability tools can be challenging as it may be hard to identify the root cause of bottlenecks. 2) Understanding the basics of distributed systems, such as how they consis.. read more  

Distributed Tracing for Distributed System: Save Your Time & Company
Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

Scaling PHP Applications with RoadRunner

Application servers like RoadRunner use long-lived PHP processes to handle multiple requests without constantly bootstrapping new execution environments, reducing overhead and improving performance. This tutorial will guide you through developing a PHP application on RoadRunner, explaining its setup.. read more  

Scaling PHP Applications with RoadRunner
Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

Distributed Circuit Breakers in Event-Driven Architectures on AWS

Understand how circuit breakers work in event-driven architectures, including the stateful checks and handling of slow requests. Implementations in serverless architectures, like using Elasticache for state storage, are discussed. Recommended resources for further reading and considerations for high.. read more  

Distributed Circuit Breakers in Event-Driven Architectures on AWS
Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

Building a GitOps CI/CD Pipeline with GitHub Actions (SOC 2)

This guide details a GitOps-based CI/CD pipeline on GitHub Actions for SOC 2 compliance, with an emphasis on simplicity and developer experience. The workflow includes automated testing, artifact publishing, and infrastructure deployment controlled through pull requests... read more  

Building a GitOps CI/CD Pipeline with GitHub Actions (SOC 2)
Link
@faun shared a link, 1 year, 6 months ago
FAUN.dev()

Optimize Kubernetes Pods’ Startup Time Using VolumeSnapshots

Pod startup time is crucial for application performance and user experience. This blog post details how VolumeSnapshots were used to reduce startup times by 83% in AWS environments at Riskified. VolumeSnapshot is a Kubernetes feature that captures and restores application volumes, improving applicat.. read more  

Optimize Kubernetes Pods’ Startup Time Using VolumeSnapshots
Story
@squadcast shared a post, 1 year, 6 months ago

How to Use Observability Tools to Set SLOs for Kubernetes Applications

Kubernetes

This blog post explores how to use observability tools to set and maintain Service Level Objectives (SLOs) for Kubernetes applications. Understanding the difference between SLOs, SLIs, and SLAs is crucial. The best observability tools for Kubernetes include Prometheus, Grafana, and Jaeger. These tools help you collect metrics, visualize data, and trace requests to set SLOs and troubleshoot performance issues. The key steps to using observability tools effectively involve observing your service's behavior, setting thresholds and error budgets for SLOs, and updating SLOs as your system evolves. By following these steps, you can ensure your Kubernetes applications meet performance and availability targets.