SYS.DOCS // DOCS

Cluster Observability

The Observability workspace is the cluster-level surface for monitoring, metrics, logs, diagnostics, and observability service management.

Open Clusters → Observability. This workspace replaces the old standalone Diagnostics cluster tab and groups the operational views under one section.

Available tabs

The workspace currently exposes these tabs:

Monitoring for workload health, node pressure, pod state, and persistent disk usage
Diagnostics for Kubernetes and K3s troubleshooting
Logs for VictoriaLogs-backed retained log search
Metrics for VictoriaMetrics-backed PromQL exploration
Alerts for custom PromQL alert rules, predefined alert packs, and current firing state
Settings for installing, updating, exposing, or removing the services

Built in alerting still appears through the global Notifications feed. The cluster Alerts tab adds user-managed PromQL rules and alert packs evaluated from the cluster metrics store.

Monitoring

The Monitoring tab gives you a live cluster health snapshot without requiring VictoriaMetrics to be installed.

It summarizes:

workload component health for Deployments, StatefulSets, and DaemonSets
ready and not-ready nodes
running, pending, failed, crashing, completed, and unknown pods
CPU and memory pressure from Kubernetes capacity, allocatable, requests, limits, and metrics API usage samples
persistent volume claim binding, capacity, usage, storage class, and mount information
snapshot warnings when a Kubernetes lookup or node usage lookup fails

The tab refreshes automatically and also includes a manual refresh action. For the dedicated workflow, see Cluster Monitoring.

Metrics

The Metrics tab is a PromQL explorer backed by the in-cluster VictoriaMetrics store.

You can:

enter a raw metric name or a PromQL expression
use autocomplete while typing metric names
browse the metric catalog and filter it by category
switch the lookback window from 5m to 7d
inspect returned series in a table
chart the returned series and zoom into a narrower time window
open the VictoriaMetrics UI (vmui) when the service is privately exposed

The explorer also shows metric catalog size, current result series count, and scrape target health from the VictoriaMetrics overview.

If an instant query does not return a live sample, Edka can fall back to the latest recent sample from a range window so the explorer still shows the newest known data point.

For the dedicated Metrics workflow, see Cluster Metrics.

Alerts

The Alerts tab lets you create and edit cluster-scoped PromQL alert rules. Rules include a query, comparator, threshold, for duration, evaluation interval, severity, labels, annotations, and optional routing labels for team or user ownership.

You can test a rule against VictoriaMetrics before saving it. Saved rules are evaluated inside the cluster by edka-agent; current state shows whether rules are inactive, pending, firing, disabled, or not evaluated yet.

Alert packs provide predefined editable rules for deployments, PostgreSQL, storage, cluster health, NAT Gateway, and host infrastructure. Host infrastructure packs require Node Exporter metrics.

For the detailed workflow, see Cluster Alerts.

Logs

The Logs tab is a retained-log explorer backed by VictoriaLogs.

You can:

run free-text or LogsQL-style filter queries
choose the lookback window and result limit
apply quick filters derived from the current result set
inspect normalized log formatting in a structured table
open the VictoriaLogs UI when the service is privately exposed

Quick filters are generated from current results and currently cover:

namespace
pod name
container name
HTTP method
request path
status code

This makes it easier to narrow a noisy cluster-wide result set down to the exact pod, path, or response class you care about.

For the dedicated Logs workflow, see Cluster Logs.

Diagnostics

Diagnostics now lives inside Observability.

It keeps the Kubernetes-centric troubleshooting workflow in one place:

warning summary derived from node readiness, pod state, events, and K3s signals
Kubernetes warning events
K3s control plane signals
pods requiring attention
manual snapshot refresh for cluster state

Diagnostics becomes fully available once the cluster kubeconfig is ready.

For the detailed diagnostics view itself, see Cluster Diagnostics.

Notifications and alerting

Edka builds organization-level notifications from cluster and Kubernetes state. They appear in the header notification menu, the dashboard attention summary, and the full Notifications page.

Notifications currently cover:

failed or errored clusters
GitOps error states
available Kubernetes and add-on updates
failed add-ons
not-ready nodes
failed, crashing, or stuck-pending pods
critical or degraded workload components
unbound persistent volume claims
persistent volumes at high or critical usage
failed CronJob Jobs
inspection failures when Edka cannot read a cluster during notification refresh

Notifications are severity-ranked as critical, warning, or info, and are categorized as cluster, kubernetes, addon, storage, or alert.

Settings and managed services

The Settings tab manages two in-cluster services:

VictoriaMetrics

From Settings you can install, update, or remove VictoriaMetrics and configure:

retention period
scrape interval
storage size and storage class
CPU and memory requests and limits
optional node metrics through the Node Exporter add-on

The overview panel also shows pod readiness, restart counts, scrape job health, resource usage, PVC usage, and warnings.

VictoriaLogs

From Settings you can install, update, or remove VictoriaLogs and configure:

retention period
maximum disk-usage threshold before retention pressure kicks in
storage size and storage class
server CPU and memory requests and limits
collector CPU and memory requests and limits
exclusion filters

The overview panel shows separate server and collector runtime health, storage, resource usage, and warnings.

Private UI exposure with Tailscale

Observability UI exposure is managed from Settings.

Current behavior:

Edka exposes observability UIs privately through Tailscale-aware traffic classes
cluster write access is required to manage exposure
the Tailscale operator must already be installed on the cluster
the resulting UI URLs are intended for private access, not public internet exposure

When exposure is configured, Edka shows direct buttons to open vmui or the VictoriaLogs UI from the corresponding tabs.

Managed app model

VictoriaMetrics and VictoriaLogs are installed as managed cluster apps from the Observability workspace. They are not handled as add-ons in the same way as Gateway controllers or certificate operators.

That means the workspace owns:

installation and removal
runtime status and progress
retention and storage configuration
private UI exposure