Disclaimer: This blog post was written as part of a collaboration with Lightrun.
Challenges in EKS Troubleshooting
Since its launch in 2018, EKS has experienced exponential growth, becoming the preferred choice for both enterprises and startups. Its rise to prominence reflects the increasing adoption of cloud-native architectures and the demand for scalable and resilient container orchestration platforms.
Today, EKS supports a vast array of applications, ranging from small-scale microservices to large-scale distributed systems. It has quickly become the most widely used managed Kubernetes service, according to a survey from the CNCF.
However, the rapid growth and widespread adoption of EKS clusters have brought forth unique challenges in troubleshooting and maintaining applications. The disconnect between developers and production applications has become more pronounced. Local and remote development environments often fail to capture the delicacies and complexities of the actual production environment running in EKS. This contrast inhibits developers' ability to accurately reproduce and debug issues that surface specifically in the production EKS environment.
Additionally, the inherent complexities of managing a production-grade distributed cluster of pods, each may have its own network, storage, and security requirements, make troubleshooting a complex process. Network restrictions, for example, as well as access controls, and limited visibility into running containers further slow debugging and performance analysis.
The main objective of this article is to explore how Lightrun, a dynamic observability platform, can enhance observability and debugging capabilities in EKS clusters. We will examine how real-time insights and dynamic instrumentation capabilities help developers and operators gain a deeper understanding of their applications' behavior in EKS production clusters.
Introducing Lightrun: A Developer-Oriented Observability Platform
Lightrun, a powerful observability platform tailored for developers, is specifically designed to address these challenges of troubleshooting in AWS EKS and other Kubernetes platforms. By providing real-time insights and dynamic instrumentation, Lightrun enhances developers' ability to identify and resolve issues without disrupting the production environment.
Seamlessly integrated with EKS, Lightrun enables developers to debug, log, and monitor their applications on the fly. They can set breakpoints, inspect variables, and analyze logs and metrics in real-time, all without requiring code changes or redeployments. This seamless integration streamlines the debugging process, accelerates issue resolution, and ensures optimal performance in EKS clusters.
In the next section, we will explore the step-by-step process of integrating Lightrun into your AWS EKS cluster. We will dive into the features and capabilities that Lightrun brings to the table.
Dynamic Instrumentation and Live Troubleshooting with Lightrun
Before starting, we need to have an EKS cluster up and running and then set up the Lightrun agent on your cluster - there are multiple ways to do this:
You can integrate Lightrun directly into your code. If you are using Node.js, you will need to install the âlightrunâ package using npm (npm install lightrun
) and then configure it in your application while replacing the LIGHTRUN_SECRET
and FULL_PATH_TO_METADATA_FILE
with their values: