Cluster Monitoring
The Monitoring tab in Clusters → Observability provides a live health snapshot for a cluster.
Use it when you need to quickly answer whether workloads are ready, nodes are healthy, pods are failing, or persistent disks are approaching capacity.
Requirements
Section titled “Requirements”Monitoring becomes available once the cluster kubeconfig is ready.
It does not require VictoriaMetrics or VictoriaLogs to be installed. The tab reads Kubernetes resources directly and uses the Kubernetes metrics API when it is available.
If the metrics API is not enabled, Edka still shows workload, pod, node, and storage state, but CPU and memory usage samples may be unavailable.
When VictoriaMetrics and Node Exporter are installed, Monitoring can also show host root filesystem and inode usage from node-exporter metrics.
What the Monitoring tab shows
Section titled “What the Monitoring tab shows”The top summary shows:
- Component Health: healthy workload components out of total components
- Nodes Ready: ready nodes out of total nodes
- Pods Flagged: pending, failed, and crashing pods across all namespaces
- Persistent Disk: aggregate PVC usage across the cluster
- Disk Alerts: volume claims at high or critical usage
The main panels show:
- CPU and memory pressure against allocatable cluster resources
- Kubernetes requests as a marker against current usage
- storage classes backing persistent volume claims
- snapshot warnings when Edka cannot collect part of the cluster state
- the highest-usage persistent volume claims
- workload components that need attention first
The view refreshes automatically and includes a manual Refresh action.
Component health
Section titled “Component health”Edka groups Kubernetes Deployments, StatefulSets, and DaemonSets into workload components.
Each component is classified as:
healthywhen desired and available replicas are ready and no matched pods are problematicdegradedwhen at least one replica is ready but the workload is not fully healthycriticalwhen the workload expects replicas but none are readyidlewhen the workload is intentionally scaled to zero
Components are sorted so critical and degraded workloads appear before healthy ones.
Pod state
Section titled “Pod state”Monitoring classifies pods into running, pending, failed, crashing, completed, or unknown.
Crash detection includes common Kubernetes container states such as:
CrashLoopBackOffCreateContainerConfigErrorCreateContainerErrorErrImagePullImagePullBackOffRunContainerError
Use Monitoring for the aggregate view and Diagnostics when you need warning events or deeper Kubernetes troubleshooting context.
Resource pressure
Section titled “Resource pressure”The resource pressure panel compares:
- CPU and memory usage from the Kubernetes metrics API
- allocatable cluster CPU and memory
- configured workload requests
- configured workload limits
If node usage samples are missing, the snapshot warning area explains that usage data could not be collected. Capacity, allocatable, requests, and limits can still be displayed from Kubernetes resource specs.
Persistent disks
Section titled “Persistent disks”The Persistent Disks table lists persistent volume claims with:
- namespace and claim name
- bind phase
- used and total capacity when node volume stats are available
- storage class
- pods currently mounting the claim
Disk usage is considered high at 85% and critical at 95%. These thresholds
also feed the global notification system.
Monitoring and notifications
Section titled “Monitoring and notifications”The same live cluster inspection that powers Monitoring is used by Edka’s global notification refresh.
When Edka detects conditions such as not-ready nodes, failing pods, degraded workloads, unbound PVCs, or high disk usage, it creates active notifications with direct actions back to Monitoring or Diagnostics.
See Platform Notifications for the full notification workflow and history.
When to use Monitoring vs Metrics
Section titled “When to use Monitoring vs Metrics”Use Monitoring when you need:
- workload readiness at a glance
- node and pod health summaries
- host filesystem and inode pressure when Node Exporter metrics are available
- PVC capacity and mount visibility
- resource pressure without writing PromQL
- a fast operational triage view
Use Metrics when you need:
- PromQL exploration
- historical timeseries context
- scrape target visibility
- application metrics exposed by workloads