Cluster Observability
The Observability workspace is the cluster-level surface for monitoring, metrics, logs, diagnostics, and observability service management.
Open Clusters → Observability. This workspace replaces the old standalone Diagnostics cluster tab and groups the operational views under one section.
Available tabs
Section titled “Available tabs”The workspace currently exposes these tabs:
- Monitoring for workload health, node pressure, pod state, and persistent disk usage
- Diagnostics for Kubernetes and K3s troubleshooting
- Logs for VictoriaLogs-backed retained log search
- Metrics for VictoriaMetrics-backed PromQL exploration
- Alerts for custom PromQL alert rules, predefined alert packs, and current firing state
- Settings for installing, updating, exposing, or removing the services
Built in alerting still appears through the global Notifications feed. The cluster Alerts tab adds user-managed PromQL rules and alert packs evaluated from the cluster metrics store.
Monitoring
Section titled “Monitoring”The Monitoring tab gives you a live cluster health snapshot without requiring VictoriaMetrics to be installed.
It summarizes:
- workload component health for Deployments, StatefulSets, and DaemonSets
- ready and not-ready nodes
- running, pending, failed, crashing, completed, and unknown pods
- CPU and memory pressure from Kubernetes capacity, allocatable, requests, limits, and metrics API usage samples
- persistent volume claim binding, capacity, usage, storage class, and mount information
- snapshot warnings when a Kubernetes lookup or node usage lookup fails
The tab refreshes automatically and also includes a manual refresh action. For the dedicated workflow, see Cluster Monitoring.
Metrics
Section titled “Metrics”The Metrics tab is a PromQL explorer backed by the in-cluster VictoriaMetrics store.
You can:
- enter a raw metric name or a PromQL expression
- use autocomplete while typing metric names
- browse the metric catalog and filter it by category
- switch the lookback window from
5mto7d - inspect returned series in a table
- chart the returned series and zoom into a narrower time window
- open the VictoriaMetrics UI (
vmui) when the service is privately exposed
The explorer also shows metric catalog size, current result series count, and scrape target health from the VictoriaMetrics overview.
If an instant query does not return a live sample, Edka can fall back to the latest recent sample from a range window so the explorer still shows the newest known data point.
For the dedicated Metrics workflow, see Cluster Metrics.
Alerts
Section titled “Alerts”The Alerts tab lets you create and edit cluster-scoped PromQL alert rules.
Rules include a query, comparator, threshold, for duration, evaluation
interval, severity, labels, annotations, and optional routing labels for team or
user ownership.
You can test a rule against VictoriaMetrics before saving it. Saved rules are
evaluated inside the cluster by edka-agent; current state shows whether rules
are inactive, pending, firing, disabled, or not evaluated yet.
Alert packs provide predefined editable rules for deployments, PostgreSQL, storage, cluster health, NAT Gateway, and host infrastructure. Host infrastructure packs require Node Exporter metrics.
For the detailed workflow, see Cluster Alerts.
The Logs tab is a retained-log explorer backed by VictoriaLogs.
You can:
- run free-text or LogsQL-style filter queries
- choose the lookback window and result limit
- apply quick filters derived from the current result set
- inspect normalized log formatting in a structured table
- open the VictoriaLogs UI when the service is privately exposed
Quick filters are generated from current results and currently cover:
- namespace
- pod name
- container name
- HTTP method
- request path
- status code
This makes it easier to narrow a noisy cluster-wide result set down to the exact pod, path, or response class you care about.
For the dedicated Logs workflow, see Cluster Logs.
Diagnostics
Section titled “Diagnostics”Diagnostics now lives inside Observability.
It keeps the Kubernetes-centric troubleshooting workflow in one place:
- warning summary derived from node readiness, pod state, events, and K3s signals
- Kubernetes warning events
- K3s control plane signals
- pods requiring attention
- manual snapshot refresh for cluster state
Diagnostics becomes fully available once the cluster kubeconfig is ready.
For the detailed diagnostics view itself, see Cluster Diagnostics.
Notifications and alerting
Section titled “Notifications and alerting”Edka builds organization-level notifications from cluster and Kubernetes state. They appear in the header notification menu, the dashboard attention summary, and the full Notifications page.
Notifications currently cover:
- failed or errored clusters
- GitOps error states
- available Kubernetes and add-on updates
- failed add-ons
- not-ready nodes
- failed, crashing, or stuck-pending pods
- critical or degraded workload components
- unbound persistent volume claims
- persistent volumes at high or critical usage
- failed CronJob Jobs
- inspection failures when Edka cannot read a cluster during notification refresh
Notifications are severity-ranked as critical, warning, or info, and are
categorized as cluster, kubernetes, addon, storage, or alert.
Settings and managed services
Section titled “Settings and managed services”The Settings tab manages two in-cluster services:
VictoriaMetrics
Section titled “VictoriaMetrics”From Settings you can install, update, or remove VictoriaMetrics and configure:
- retention period
- scrape interval
- storage size and storage class
- CPU and memory requests and limits
- optional node metrics through the Node Exporter add-on
The overview panel also shows pod readiness, restart counts, scrape job health, resource usage, PVC usage, and warnings.
VictoriaLogs
Section titled “VictoriaLogs”From Settings you can install, update, or remove VictoriaLogs and configure:
- retention period
- maximum disk-usage threshold before retention pressure kicks in
- storage size and storage class
- server CPU and memory requests and limits
- collector CPU and memory requests and limits
- exclusion filters
The overview panel shows separate server and collector runtime health, storage, resource usage, and warnings.
Private UI exposure with Tailscale
Section titled “Private UI exposure with Tailscale”Observability UI exposure is managed from Settings.
Current behavior:
- Edka exposes observability UIs privately through Tailscale-aware traffic classes
- cluster write access is required to manage exposure
- the Tailscale operator must already be installed on the cluster
- the resulting UI URLs are intended for private access, not public internet exposure
When exposure is configured, Edka shows direct buttons to open vmui or the
VictoriaLogs UI from the corresponding tabs.
Managed app model
Section titled “Managed app model”VictoriaMetrics and VictoriaLogs are installed as managed cluster apps from the Observability workspace. They are not handled as add-ons in the same way as Gateway controllers or certificate operators.
That means the workspace owns:
- installation and removal
- runtime status and progress
- retention and storage configuration
- private UI exposure