Core Kubernetes Monitoring Concepts and Architecture
Kubernetes monitoring operates on a multi-layered architecture that collects metrics from various cluster components. The metrics-server gathers resource metrics from kubelet processes on nodes, providing CPU and memory utilization data for pods and nodes.
How Metrics Flow Through Your Cluster
The CKA exam focuses heavily on understanding how metrics flow through the system and where monitoring data originates. Control plane components also require monitoring, including the API server, scheduler, and controller-manager. Prometheus serves as the industry standard for collecting and storing time-series data, while Grafana provides visualization capabilities.
These tools integrate with Kubernetes through service discovery mechanisms like Kubernetes-SD. The monitoring stack also includes alertmanager for alert routing and kube-state-metrics for tracking Kubernetes object states rather than just resource usage.
Resource Metrics vs. Custom Metrics
Focus on understanding the difference between resource metrics (CPU and memory) and custom metrics, as this distinction appears frequently in practical scenarios. The API server's metrics endpoint works with kubectl to query metrics directly using commands like kubectl top nodes and kubectl top pods.
Querying Cluster Metrics
You should practice these kubectl commands regularly:
- kubectl top nodes: View node resource consumption
- kubectl top pods: View pod resource consumption
- kubectl get metrics: Direct API metric queries
- kubectl describe nodes: Detailed node resource information
Implementing Metrics Collection with Prometheus and kube-state-metrics
Prometheus monitoring in Kubernetes requires specific configuration and understanding of scraping mechanics. The prometheus.yml configuration file defines scrape jobs that tell Prometheus which endpoints to pull metrics from at regular intervals, typically every 15-30 seconds.
Dynamic Service Discovery
For Kubernetes, you'll configure scrape configs that use kubernetes_sd_config to automatically discover and monitor API servers, nodes, pods, and services. This dynamic discovery is crucial because Kubernetes clusters are ephemeral environments where resources constantly change.
Kube-state-metrics complements Prometheus by exposing Kubernetes-specific metadata like pod replicas, deployment status, and node conditions that the kubelet doesn't provide. When implementing monitoring for the CKA exam, understand how to deploy these components as pods within the cluster, configure service monitors or prometheus rules, and ensure proper RBAC permissions.
Understanding Metric Labels
A critical concept is understanding metric labels and how Prometheus uses them for filtering and aggregation. The metric container_memory_usage_bytes includes labels like pod_name, namespace, and node. This lets you answer questions like which pods consume the most memory in a specific namespace.
Pull-Based vs. Push-Based Monitoring
You must grasp the difference between these monitoring approaches:
- Pull-based (Prometheus): Prometheus scrapes metrics from endpoints at regular intervals
- Push-based: Applications push metrics to a receiving system
Kubernetes typically uses pull-based collection because it fits the cluster's dynamic, distributed architecture.
Logging, Events, and Troubleshooting with Monitoring Data
While metrics provide numerical data about cluster and application health, logs and events tell the story of what's happening in your cluster. Kubernetes generates system events automatically when significant actions occur, such as pod scheduling, failures, or resource warnings.
You can query events using kubectl get events or kubectl describe pods. The CKA exam expects you to interpret these events to diagnose problems effectively.
Understanding Kubernetes Logging
Different components generate different logs:
- Kubelet logs: Generated on each node for container lifecycle events
- API server logs: Track all API requests and cluster operations
- Application logs: Generated by containers, accessible via kubectl logs
- Audit logs: Record sensitive operations for compliance
Correlating Data for Troubleshooting
Understanding how to correlate events, pod logs, and metrics is essential. If a pod is crash-looping, you'd check pod events for scheduling errors, use kubectl logs to see application output, and examine metrics to understand resource constraints.
Centralized logging systems like ELK (Elasticsearch, Logstash, Kibana) or Loki can aggregate logs from all containers. This makes troubleshooting easier across multiple pods and nodes.
Advanced Troubleshooting Techniques
The CKA exam focuses on your ability to manually diagnose issues. Use these approaches:
- Access node logs via ssh or journalctl if systemd manages the kubelet
- Remember that container logs are ephemeral and lost when containers restart
- Examine kubelet logs on nodes where pod scheduling fails or containers crash
- Access control plane logs on master nodes (typically in /var/log/pods or /var/log/containers)
- Use kubectl debug pods to inject debugging containers into running or crashing pods
Resource Limits, Requests, and Performance Monitoring
Kubernetes resource requests and limits directly impact monitoring and cluster health. Resource requests tell the scheduler how much CPU and memory a pod needs, enabling proper pod placement across nodes. Limits enforce maximum resource consumption, protecting the node from being overwhelmed by a single pod.
Interpreting Resource Usage Data
For the CKA exam, you must understand how requests and limits relate to monitoring metrics. The kubectl top command shows actual resource usage against the requested amounts, helping identify over-provisioned or under-provisioned applications.
For example, if a pod requests 500Mi of memory but uses only 100Mi consistently, you're wasting cluster resources. If a pod requests 500Mi but regularly approaches 500Mi usage, you risk out-of-memory conditions and eviction.
Optimizing Resource Allocation
Vertical Pod Autoscaling (VPA) uses monitoring data historically to recommend appropriate resource requests and limits, automating optimization. The CKA exam may ask about identifying problematic resource allocations using monitoring tools or interpreting why pods are being evicted from nodes.
Node Pressure Conditions
When a node experiences memory pressure, disk pressure, or PID pressure, the kubelet begins evicting pods following an eviction policy based on QoS class. Understanding how to monitor these conditions and respond appropriately is essential for the exam.
Quality of Service Classes
Three QoS classes determine eviction order during resource contention:
- Guaranteed: Requests equal limits, evicted last
- Burstable: Requests less than limits, evicted second
- BestEffort: No requests or limits, evicted first
Monitoring tools should help identify which QoS class your critical applications belong to and whether you've properly configured them for reliability.
CKA Exam Monitoring Scenarios and Practical Tips
The CKA exam typically includes hands-on monitoring scenarios where you must diagnose cluster issues and understand monitoring data. Common scenarios include identifying nodes with high CPU or memory usage, finding pods that crash or fail to start, determining which pods use the most resources, and accessing logs from multiple containers.
Essential kubectl Commands for the Exam
You should practice using kubectl top commands extensively, as they appear frequently in practical scenarios. Here are the most important commands:
- kubectl top nodes: View all node resource consumption
- kubectl top pods --all-namespaces: View pods across all namespaces
- kubectl top pods -n namespace --sort-by=memory: Find memory-heavy pods in a specific namespace
- kubectl describe nodes: Show detailed node status, capacity, and events
- kubectl get nodes -o wide: Quick status view of all nodes
Interpreting kubectl describe Output
Practice interpreting kubectl describe output, as it includes conditions, capacity, allocatable resources, and recent events that tell the story of node health. This single command provides tremendous diagnostic information.
Accessing and Analyzing Logs
Familiarize yourself with accessing logs using kubectl logs pod-name. Understand these important flags:
- -f: Follow logs in real-time
- -c: Select specific containers in multi-container pods
- --previous: View logs from a crashed container
- --all-containers: Show logs from all containers in a pod
Common CKA Exam Scenarios
The exam might present scenarios where a deployment isn't scaling as expected, requiring you to check events, pod logs, and node capacity simultaneously. Another common scenario involves understanding why a pod isn't scheduling, which requires examining node conditions, resource availability, and events. Always check RBAC permissions when monitoring or logging queries fail, as insufficient permissions might prevent Prometheus from scraping metrics or your user account from accessing logs.
Practice setting resource requests and limits correctly. Improperly configured resources lead directly to monitoring and performance issues.
