Posts & Updates about "reliability"

Newest FAUNers

@tophecnamts38 (tophecnamts38 naudin)

cnam

@ejaastudio

@andrejbanas (Andrey Banas)

@matthieu2603+devopslinks

@julianschn (Julian s)

Trending FAUNers

@devopslinks (DevOpsLinks #DevOps)

FAUN.dev()

20.00

@kala (Kala #GenAI)

FAUN.dev()

20.00

@varbear (VarBear #SoftwareEngineering)

FAUN.dev()

20.00

@arshadmas (arshad mas)

Product Marketer, manageengine

12.00

@kaptain (Kaptain #Kubernetes)

FAUN.dev()

11.00

@sancharini (Sancharini Panda)

11.00

@elenamia (Elena Mia)

Technical Consultant, Damco S…

10.00

@suarezsara (Sara Suarez)

10.00

@shubham321 (shubham jha)

Software engineer, Keploy

10.00

@marxjenes (Marx Jenes)

10.00

Latest Pawfives 🐾

@shurup gave 🐾 to
Helm Cheat Sheet: Everything You Need to Know to Start Using Helm by @eon01

@shurup gave 🐾 to
OpenClaw Lightweight Alternative Launches: A 10MB AI Assistant That Runs on $10 Hardware by @kala

@shurup gave 🐾 to
Spotlight on SIG Architecture: API Governance by @kaptain

@nelly96 gave 🐾 to
Verification vs Validation Explained for Beginners in QA by @sancharini

@aleonrangel gave 🐾 to
Difference between Agile and Scrum by @viktoriiagolovtseva

@mjh gave 🐾 to
Announcing FAUN.sensei() — Self-paced guides to grow fast — even when tech moves faster. by @eon01

@tairascott gave 🐾 to
Helm 4 or Nelm? What's the difference by @shurup

@tairascott gave 🐾 to
Hidden Correlations Traditional Monitoring Misses by @anjali

@tairascott gave 🐾 to
How to Track Down the Real Cause of Sudden Latency Spikes by @anjali

Publish on FAUN.dev()

Orchestrating the Cloud

⚡️ The clean “Shh… Orchestrating the Cloud” design says just enough — a subtle nod to late-night deployments, calm incident handling, and systems humming in the background

> Get this Swag!

cat /var/logs/*

⚡️ Cats prefer Linux! Warm your soul with a nice mug perfectly sized black ceramic mug.

> Get this Swag!

kubectl apply -f mug.yaml

Because one container ain't enough

> Get this Swag

Git Pull Coffee

Git pull coffee then git merge your code! Warm your soul with a nice mug perfectly sized black ceramic mug.

> Get this Swag!

I fix problems

I fix problems you didn’t know you have in a way, you don’t understand.

> Get this Swag!

Never Quit

This unisex heavy blend Hooded Sweatshirt is relaxation itself. It's made with a thick blend of Cotton and Polyester, which makes it plush, soft and warm

> Get this Swag

Painless Docker - 2nd Edition

A Comprehensive Guide to Mastering Docker and its Ecosystem

> Get your Copy

Helm in Practice

Designing, Deploying, and Operating Kubernetes Applications at Scale

> Get your Copy

Observability with Prometheus and Grafana

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems

> Get your Copy

Generative AI For The Rest Of US

Your Future, Decoded

> Get your Copy

Posts tagged with reliability..

Story

@squadcast shared a post, 1 year, 9 months ago

Striking a Balance: Reliability Management for Innovation-Driven Companies

#reliabi... #inciden... #reliabi...

This blog post dives into the world of reliability management for SRE teams. It emphasizes the importance of achieving a balance between innovation and system stability. The article explores various frameworks and best practices that SRE teams can leverage to achieve this equilibrium. Some of the key takeaways include implementing SLOs and error budgets, adopting DevOps practices, and utilizing Infrastructure as Code (IaC). The blog also highlights the importance of fostering a culture of collaboration and learning within the SRE team.

Story

@boldlink shared a post, 3 years, 7 months ago

AWS DevOps Consultancy, Boldlink

An Overview of AWS Well-Architected Framework

#Perform... #Securit... #reliabi... #cost #aws

Thinking of getting started with AWS cloud computing or migrating your existing workloads to AWS? Here is a quick guide on how the 5 pillars of AWS’s well-architected framework will help you build a secure, high-performing, resilient and efficient cloud infrastructure for your workloads.So basically..

Story

@yair_stark shared a post, 4 years, 1 month ago

Error Budget Is All You Need - Part 2

#monitor... #reliabi... #slo

In part 1 I proposed a simple modification to Google’s Multi-Window Multi-Burn Rate alerting setup and I showed how this modification addresses the cases of varying-traffic services and typical latency SLOs.

Story

@yair_stark shared a post, 4 years, 1 month ago

Error Budget Is All You Need - Part 1

#reliabi... #slo

One of the great chapters of Google’s Site Reliability Engineering (SRE) second book is chapter 5 — Alerting on SLOs (Service Level Objectives). This chapter takes you on a comprehensive journey through several setups of alerts on SLOs, starting with the simplest non-optimized one and by iterating through several setups reach the ultimate one, which is optimized w.r.t to the main four alerting attributes: recall, precision, detection time and reset time.

Story

@tharunshiv shared a post, 4 years, 1 month ago

Site Reliability Engineer, PhonePe

#1 What's Site Reliability Engineering [SRE] | Roles & Responsibilities | Technologies involved

#SRE #enginee... #enginee... #site #reliabi...

Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.

Link

@prathamesh-sonpatki shared a link, 2 years, 7 months ago

SRE, Last9.io

MTTF vs. MTBF vs. MTTD vs. MTTR

#MTTR #reliabi... #Softwar... #observa...

Build & Scale AI Workloads on Kubernetes

If you're working with Kubernetes and exploring AI/ML in real-world environments, this 5-hour live workshop focuses on the practical side of running AI workloads in production. Early Bird Offer – 50% Off (No Code Needed) : Build & Scale AI Workloads on Kubernetes Tickets, Sat, Mar 28, 2026 at 7:00 PM | Eventbrite

Learn More