ZEKTOR.IO Docs

Monitoring

Real-time metrics, dashboards, and query insights for your Zektor.io database instances.

Overview

Zektor.io provides built-in monitoring for all database instances. Track performance metrics, query patterns, and resource utilization directly from your dashboard — no external tools required.

Instance Dashboard

Every instance has a dedicated dashboard showing real-time and historical metrics at a glance:

  • Status — Current instance health (Active, Provisioning, Stopped)
  • Uptime — How long your instance has been running
  • Connection count — Current active connections
  • Resource utilization — CPU, memory, and storage usage

Reading the Dashboard Charts

The dashboard displays time-series charts for key metrics. Here's what to look for:

ChartHealthy RangeWarning Signs
CPU UsageBelow 70% averageSustained >80% indicates need to scale up
Memory UsageBelow 80%Sustained >90% may cause OOM or swapping
Disk UsageBelow 70%Above 80% triggers auto-scaling (if enabled)
Active ConnectionsWell below max_connectionsApproaching limit causes connection failures

Metrics

PostgreSQL Metrics

Navigate to the Metrics tab of your PostgreSQL instance to view:

MetricDescription
CPU UsagePercentage of allocated CPU being used
Memory UsageRAM utilization including shared buffers and cache
Disk UsageSSD storage consumed vs. allocated
WAL StorageWrite-ahead log storage usage (important for PITR)
Active ConnectionsNumber of currently connected clients
Transactions/secRate of committed transactions
Rows Read/WrittenRow-level I/O activity
Replication LagDelay between primary and replica (if applicable)

Valkey Metrics

For Valkey (cache) instances:

MetricDescription
CPU UsageCPU utilization
Memory UsageUsed memory vs. allocated
Connected ClientsActive client connections
Commands/secRate of executed commands
Hit RateCache hit ratio (higher is better — aim for >90%)
Evicted KeysNumber of evicted keys due to memory pressure
Keyspace SizeTotal number of keys stored

Common Issues and What to Check

Use this table to quickly diagnose problems:

SymptomMetrics to CheckLikely CauseAction
Slow queriesCPU, Rows ReadMissing indexes, full table scansAdd indexes, optimize queries
Connection errorsActive ConnectionsHitting max_connectionsUse pooling, scale up
Application timeoutsCPU, MemoryResource saturationScale up tier
Growing disk usageDisk Usage, WAL StorageData growth, WAL accumulationEnable auto-scaling, clean up
High cache miss rateHit Rate, MemoryWorking set exceeds RAMScale up, review TTLs
Evicted keys (Valkey)Evicted Keys, MemoryMemory pressureScale up, reduce key count

Query Insights (PostgreSQL)

Using pg_stat_statements

Enable the pg_stat_statements extension to track query performance:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Then inspect your most expensive queries:

-- Top 10 queries by total execution time
SELECT
  query,
  calls,
  round(total_exec_time::numeric, 2) AS total_time_ms,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Finding Slow Queries

-- Queries with average execution time > 100ms
SELECT
  query,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;

Finding Frequently Called Queries

-- Most frequently called queries
SELECT
  query,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

Finding Missing Indexes

Identify tables where sequential scans dominate over index scans:

SELECT
  relname AS table,
  seq_scan,
  idx_scan,
  round(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 1) AS seq_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 20;

Tables with a high seq_pct (>50%) and many rows likely need indexes.

Resetting Statistics

SELECT pg_stat_statements_reset();

Monitoring Connections

Active Connections

SELECT
  datname,
  usename,
  application_name,
  client_addr,
  state,
  query_start,
  query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

Connection Count by State

SELECT
  state,
  count(*)
FROM pg_stat_activity
GROUP BY state;

Database Size Monitoring

Check Database Size

SELECT
  pg_database.datname,
  pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

Check Table Sizes

SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
  pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

Monitor WAL Storage

WAL storage is critical for point-in-time recovery. If WAL storage grows unexpectedly:

  • High write volume generates more WAL
  • Long backup retention keeps more WAL segments
  • Check for long-running transactions preventing WAL cleanup:
    SELECT pid, state, query_start, query
    FROM pg_stat_activity
    WHERE state != 'idle'
    ORDER BY query_start
    LIMIT 10;

Alerting

Currently, Zektor.io does not provide built-in alerting. We recommend:

  • Monitoring your dashboard regularly for usage spikes
  • Setting up external monitoring with tools like Grafana, Datadog, or custom scripts that query pg_stat_statements
  • Watching disk usage to avoid running out of storage

Built-in alerting is on our roadmap.

Best Practices

  1. Enable pg_stat_statements on all production databases to track query performance
  2. Monitor disk usage regularly — auto-scaling handles growth but sudden spikes warrant investigation
  3. Watch connection counts — approaching max_connections can cause connection failures
  4. Review slow queries weekly to identify optimization opportunities
  5. Track WAL storage — unexpectedly high WAL usage may indicate issues
  6. Set up a monitoring routine — Check your dashboard at least daily for production databases

Next Steps

  • Extensions — Install pg_stat_statements and other monitoring tools
  • Scaling — Upgrade resources if you're hitting limits
  • Connecting — Configure connection pooling to manage connections
  • Troubleshooting — Diagnose and fix common issues

On this page