DevOps | The fastest way for busy developers to keep up with technologies 🚀

Posts from the community tagged with DevOps...

Read CloudNative Weekly Newsletter

#DevOps #CloudCo... #SRE

CloudNative Weekly Newsletter, The Chief I/O. Curated CloudNative news, tutorials, tools and more!

Join thousands of other readers, 100% free, unsubscribe anytime.

Story

@squadcast shared a post, 1 week, 6 days ago

Building and Maintaining a Strong SRE Team in Your Company: 7 Key Tips

#SRE Too... #SRE #DevOps

This blog post offers guidance on building and maintaining an SRE team. It emphasizes the importance of SRE in today's world and outlines seven key tips to achieve success. Here's a summary of those tips:

Start small and focus internally: Begin by assigning staff from existing departments to focus on maintaining service reliability.

Recruit the right people: Look for SRE professionals with problem-solving skills, automation expertise, and a commitment to continuous learning. They should also be excellent team players with a broad perspective. Consider using SRE tooling to improve team efficiency.

Define your SLOs: Establish clear and achievable performance indicators for your systems.

Establish a holistic incident management system: Implement a system for tracking on-call duties and streamlining the incident resolution process. SRE tooling can be helpful here.

Accept failure as inevitable: Recognize that failures are part of the development process. Focus on creating a minimum viable product and improving over time.

Conduct incident postmortems to learn from mistakes: Analyze incidents to identify root causes and develop solutions to prevent future occurrences.

Maintain a user-friendly incident management system: Choose an incident management system that is easy to use, fosters communication, and integrates with other relevant tools.

By following these steps and leveraging SRE tooling, you can establish a strong SRE team that keeps your systems reliable and your customers satisfied.

356 views

Story

@rusychokshi shared a post, 10 months, 2 weeks ago

Technical Writer, Cloudraft.io

Demystifying DevOps: Key Insights Every Developer Needs to Thrive?

#continu... #Continu... #Develop... #DevOps

Understand DevOps principles clearly if you are a developer. This article outlines the DevOps concepts for a developer's understanding

735 views

Story

@vmihailenco shared a post, 10 months, 3 weeks ago

@uptrace

Getting started with Kvrocks and go-redis

#Program... #DevOps

Learn how to use go-redis client to get started with Apache Kvrocks, a distributed key-value NoSQL database.

1k views

Story

@vmihailenco shared a post, 11 months, 2 weeks ago

@uptrace

Monitoring CPU/RAM/disk metrics with OpenTelemetry and Uptrace

#open so... #Perform... #DevOps #monitor...

OpenTeleletry Collector is an open source data collection pipeline that allows you to monitor CPU, RAM, disk, network metrics, and many more.

Collector itself does not include built-in storage or analysis capabilities, but you can export the data to Uptrace and ClickHouse, using them as a replacement for Grafana and Prometheus.

When compared to Prometheus, ClickHouse can offer small on-disk data size and better query performance when analyzing millions of timeseries.

1k views

Story

@mohammad_zaigam shared a post, 1 year, 1 month ago

Technical Solutions Specialist, Logiq.ai

THE 5 STAGES OF THE OBSERVABILITY MATURITY MODEL

#cloud #monitor... #observa... #AIOps #DevOps

The unprecedented growth of data in recent years has led to a demand for evolution in traditional monitoring practices.

The current observability maturity model is a good solution but needs further augmentations.

The widely accepted model includes the following stages:

1) Monitoring (Is everything in working order?)

2) Observability (Why is it not working?)

3) Full-Stack Observability (What is the origin of the problem, and what are its consequences?)

4) Intelligent Observability (How to predict anomalies and automate response?)

LOGIQ is supporting the next stage in the model i.e, Federated Observability. In other words, data availability for consumers with on-demand convenience.

1k views

Story

@squadcast shared a post, 1 year, 2 months ago

Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling

#kuberne... #SRE #DevOps

As the complexity of a Kubernetes cluster grows, managing resources such as CPU and memory becomes more challenging. Efficient pod scheduling is critical to ensure optimal resource utilization and enable a stable and responsive environment for applications to run in. In this blog, we will delve into the intricacies of pod scheduling, including optimization of resource allocation and balancing workloads.

Squadcast - Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling

1k views

Story

@squadcast shared a post, 1 year, 3 months ago

What are Webhooks and why should developers use them?

#develop... #SRE #DevOps

Webhooks and APIs are a developer-friendly approach to building modern-day web applications. In this blog, we explain what a webhook is, do a detailed webhooks vs. API comparison, and explain why we recommend developers use them with Squadcast.

1k views

Story

@emile shared a post, 1 year, 3 months ago

Co-founder, Nebuly

Tutorial on Dynamic GPU Partitioning with MIG to Maximize the Utilization of GPUs in Kubernetes

#kuberne... #GPU #MLOps #utiliza... #DevOps

Partitioning is a way to divide GPU resources into smaller slices. This allows Pods to be scheduled only on the memory/compute resources they actually need, thus increasing GPU utilization and reducing infrastructure costs in Kubernetes clusters.

nos, opensource to maximize GPU utilization in Kubernetes

1k views

Story

@squadcast shared a post, 1 year, 3 months ago

Introducing our open source SLO Tracker - A simple tool to track SLOs and Error Budget

#Inciden... #observa... #DevOps #SRE

Check out our open-source SLO tracker and set up your SLO's so that you can accurately track your error budgets. Automate your SRE, with Squadcast's SLO tool!

1k views

Story

@squadcast shared a post, 1 year, 3 months ago

What are Network Operation Centers (NOC) and how do NOC teams work?

#on-call... #DevOps #SRE #Inciden...

In highly competitive markets, businesses have to strive hard to be always available & operational. Hence businesses invest heavily in dedicated Network Operations Centers (NOC) that constantly monitor the performance of an organization’s IT resources. In this blog, we will explore NOC and its importance.

1k views

System Architect DevOps

Volunteer Software Developer Position with Superstars

Front End Software Developer

Backend Engineer

DevOps Engineer | Center for Humans and Machines

Senior Site Reliability Engineer (100% Remote)

Distributed Systems Testing Engineer (100% Remote)

Senior Azure Devops Engineer