Skip to content
SYS.DOCS // DOCS

Cluster Monitoring

The Monitoring tab in Clusters → Observability provides a live health snapshot for a cluster.

Use it when you need to quickly answer whether workloads are ready, nodes are healthy, pods are failing, or persistent disks are approaching capacity.

Monitoring becomes available once the cluster kubeconfig is ready.

It does not require VictoriaMetrics or VictoriaLogs to be installed. The tab reads Kubernetes resources directly and uses the Kubernetes metrics API when it is available.

If the metrics API is not enabled, Edka still shows workload, pod, node, and storage state, but CPU and memory usage samples may be unavailable.

When VictoriaMetrics and Node Exporter are installed, Monitoring can also show host root filesystem and inode usage from node-exporter metrics.

The top summary shows:

  • Component Health: healthy workload components out of total components
  • Nodes Ready: ready nodes out of total nodes
  • Pods Flagged: pending, failed, and crashing pods across all namespaces
  • Persistent Disk: aggregate PVC usage across the cluster
  • Disk Alerts: volume claims at high or critical usage

The main panels show:

  • CPU and memory pressure against allocatable cluster resources
  • Kubernetes requests as a marker against current usage
  • storage classes backing persistent volume claims
  • snapshot warnings when Edka cannot collect part of the cluster state
  • the highest-usage persistent volume claims
  • workload components that need attention first

The view refreshes automatically and includes a manual Refresh action.

Edka groups Kubernetes Deployments, StatefulSets, and DaemonSets into workload components.

Each component is classified as:

  • healthy when desired and available replicas are ready and no matched pods are problematic
  • degraded when at least one replica is ready but the workload is not fully healthy
  • critical when the workload expects replicas but none are ready
  • idle when the workload is intentionally scaled to zero

Components are sorted so critical and degraded workloads appear before healthy ones.

Monitoring classifies pods into running, pending, failed, crashing, completed, or unknown.

Crash detection includes common Kubernetes container states such as:

  • CrashLoopBackOff
  • CreateContainerConfigError
  • CreateContainerError
  • ErrImagePull
  • ImagePullBackOff
  • RunContainerError

Use Monitoring for the aggregate view and Diagnostics when you need warning events or deeper Kubernetes troubleshooting context.

The resource pressure panel compares:

  • CPU and memory usage from the Kubernetes metrics API
  • allocatable cluster CPU and memory
  • configured workload requests
  • configured workload limits

If node usage samples are missing, the snapshot warning area explains that usage data could not be collected. Capacity, allocatable, requests, and limits can still be displayed from Kubernetes resource specs.

The Persistent Disks table lists persistent volume claims with:

  • namespace and claim name
  • bind phase
  • used and total capacity when node volume stats are available
  • storage class
  • pods currently mounting the claim

Disk usage is considered high at 85% and critical at 95%. These thresholds also feed the global notification system.

The same live cluster inspection that powers Monitoring is used by Edka’s global notification refresh.

When Edka detects conditions such as not-ready nodes, failing pods, degraded workloads, unbound PVCs, or high disk usage, it creates active notifications with direct actions back to Monitoring or Diagnostics.

See Platform Notifications for the full notification workflow and history.

Use Monitoring when you need:

  • workload readiness at a glance
  • node and pod health summaries
  • host filesystem and inode pressure when Node Exporter metrics are available
  • PVC capacity and mount visibility
  • resource pressure without writing PromQL
  • a fast operational triage view

Use Metrics when you need:

  • PromQL exploration
  • historical timeseries context
  • scrape target visibility
  • application metrics exposed by workloads