SYS.DOCS // DOCS

Cluster Monitoring

The Monitoring tab in Clusters → Observability provides a live health snapshot for a cluster.

Use it when you need to quickly answer whether workloads are ready, nodes are healthy, pods are failing, or persistent disks are approaching capacity.

Requirements

Monitoring becomes available once the cluster kubeconfig is ready.

It does not require VictoriaMetrics or VictoriaLogs to be installed. The tab reads Kubernetes resources directly and uses the Kubernetes metrics API when it is available.

If the metrics API is not enabled, Edka still shows workload, pod, node, and storage state, but CPU and memory usage samples may be unavailable.

When VictoriaMetrics and Node Exporter are installed, Monitoring can also show host root filesystem and inode usage from node-exporter metrics.

What the Monitoring tab shows

The top summary shows:

Component Health: healthy workload components out of total components
Nodes Ready: ready nodes out of total nodes
Pods Flagged: pending, failed, and crashing pods across all namespaces
Persistent Disk: aggregate PVC usage across the cluster
Disk Alerts: volume claims at high or critical usage

The main panels show:

CPU and memory pressure against allocatable cluster resources
Kubernetes requests as a marker against current usage
storage classes backing persistent volume claims
snapshot warnings when Edka cannot collect part of the cluster state
the highest-usage persistent volume claims
workload components that need attention first

The view refreshes automatically and includes a manual Refresh action.

Component health

Edka groups Kubernetes Deployments, StatefulSets, and DaemonSets into workload components.

Each component is classified as:

healthy when desired and available replicas are ready and no matched pods are problematic
degraded when at least one replica is ready but the workload is not fully healthy
critical when the workload expects replicas but none are ready
idle when the workload is intentionally scaled to zero

Components are sorted so critical and degraded workloads appear before healthy ones.

Pod state

Monitoring classifies pods into running, pending, failed, crashing, completed, or unknown.

Crash detection includes common Kubernetes container states such as:

CrashLoopBackOff
CreateContainerConfigError
CreateContainerError
ErrImagePull
ImagePullBackOff
RunContainerError

Use Monitoring for the aggregate view and Diagnostics when you need warning events or deeper Kubernetes troubleshooting context.

Resource pressure

The resource pressure panel compares:

CPU and memory usage from the Kubernetes metrics API
allocatable cluster CPU and memory
configured workload requests
configured workload limits

If node usage samples are missing, the snapshot warning area explains that usage data could not be collected. Capacity, allocatable, requests, and limits can still be displayed from Kubernetes resource specs.

Persistent disks

The Persistent Disks table lists persistent volume claims with:

namespace and claim name
bind phase
used and total capacity when node volume stats are available
storage class
pods currently mounting the claim

Disk usage is considered high at 85% and critical at 95%. These thresholds also feed the global notification system.

Monitoring and notifications

The same live cluster inspection that powers Monitoring is used by Edka’s global notification refresh.

When Edka detects conditions such as not-ready nodes, failing pods, degraded workloads, unbound PVCs, or high disk usage, it creates active notifications with direct actions back to Monitoring or Diagnostics.

See Platform Notifications for the full notification workflow and history.

When to use Monitoring vs Metrics

Use Monitoring when you need:

workload readiness at a glance
node and pod health summaries
host filesystem and inode pressure when Node Exporter metrics are available
PVC capacity and mount visibility
resource pressure without writing PromQL
a fast operational triage view

Use Metrics when you need:

PromQL exploration
historical timeseries context
scrape target visibility
application metrics exposed by workloads