Monitoring

Real-time metrics, dashboards, and query insights for your Zektor.io database instances.

Overview

Zektor.io provides built-in monitoring for all database instances. Track performance metrics, query patterns, and resource utilization directly from your dashboard — no external tools required.

Instance Dashboard

Every instance has a dedicated dashboard showing real-time and historical metrics at a glance:

Status — Current instance health (Active, Provisioning, Stopped)
Uptime — How long your instance has been running
Connection count — Current active connections
Resource utilization — CPU, memory, and storage usage

Reading the Dashboard Charts

The dashboard displays time-series charts for key metrics. Here's what to look for:

Chart	Healthy Range	Warning Signs
CPU Usage	Below 70% average	Sustained >80% indicates need to scale up
Memory Usage	Below 80%	Sustained >90% may cause OOM or swapping
Disk Usage	Below 70%	Above 80% triggers auto-scaling (if enabled)
Active Connections	Well below max_connections	Approaching limit causes connection failures

Metrics

PostgreSQL Metrics

Navigate to the Metrics tab of your PostgreSQL instance to view:

Metric	Description
CPU Usage	Percentage of allocated CPU being used
Memory Usage	RAM utilization including shared buffers and cache
Disk Usage	SSD storage consumed vs. allocated
WAL Storage	Write-ahead log storage usage (important for PITR)
Active Connections	Number of currently connected clients
Transactions/sec	Rate of committed transactions
Rows Read/Written	Row-level I/O activity
Replication Lag	Delay between primary and replica (if applicable)

Valkey Metrics

For Valkey (cache) instances:

Metric	Description
CPU Usage	CPU utilization
Memory Usage	Used memory vs. allocated
Connected Clients	Active client connections
Commands/sec	Rate of executed commands
Hit Rate	Cache hit ratio (higher is better — aim for >90%)
Evicted Keys	Number of evicted keys due to memory pressure
Keyspace Size	Total number of keys stored

Common Issues and What to Check

Use this table to quickly diagnose problems:

Symptom	Metrics to Check	Likely Cause	Action
Slow queries	CPU, Rows Read	Missing indexes, full table scans	Add indexes, optimize queries
Connection errors	Active Connections	Hitting max_connections	Use pooling, scale up
Application timeouts	CPU, Memory	Resource saturation	Scale up tier
Growing disk usage	Disk Usage, WAL Storage	Data growth, WAL accumulation	Enable auto-scaling, clean up
High cache miss rate	Hit Rate, Memory	Working set exceeds RAM	Scale up, review TTLs
Evicted keys (Valkey)	Evicted Keys, Memory	Memory pressure	Scale up, reduce key count

Query Insights (PostgreSQL)

Using pg_stat_statements

Enable the pg_stat_statements extension to track query performance:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Then inspect your most expensive queries:

-- Top 10 queries by total execution time
SELECT
  query,
  calls,
  round(total_exec_time::numeric, 2) AS total_time_ms,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Finding Slow Queries

-- Queries with average execution time > 100ms
SELECT
  query,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;

Finding Frequently Called Queries

-- Most frequently called queries
SELECT
  query,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

Finding Missing Indexes

Identify tables where sequential scans dominate over index scans:

SELECT
  relname AS table,
  seq_scan,
  idx_scan,
  round(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 1) AS seq_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 20;

Tables with a high seq_pct (>50%) and many rows likely need indexes.

Resetting Statistics

SELECT pg_stat_statements_reset();

Monitoring Connections

Active Connections

SELECT
  datname,
  usename,
  application_name,
  client_addr,
  state,
  query_start,
  query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

Connection Count by State

SELECT
  state,
  count(*)
FROM pg_stat_activity
GROUP BY state;

Database Size Monitoring

Check Database Size

SELECT
  pg_database.datname,
  pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

Check Table Sizes

SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
  pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

Monitor WAL Storage

WAL storage is critical for point-in-time recovery. If WAL storage grows unexpectedly:

High write volume generates more WAL
Long backup retention keeps more WAL segments

Check for long-running transactions preventing WAL cleanup:

SELECT pid, state, query_start, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start
LIMIT 10;

Alerting

Currently, Zektor.io does not provide built-in alerting. We recommend:

Monitoring your dashboard regularly for usage spikes
Setting up external monitoring with tools like Grafana, Datadog, or custom scripts that query pg_stat_statements
Watching disk usage to avoid running out of storage

Built-in alerting is on our roadmap.

Best Practices

Enable pg_stat_statements on all production databases to track query performance
Monitor disk usage regularly — auto-scaling handles growth but sudden spikes warrant investigation
Watch connection counts — approaching max_connections can cause connection failures
Review slow queries weekly to identify optimization opportunities
Track WAL storage — unexpectedly high WAL usage may indicate issues
Set up a monitoring routine — Check your dashboard at least daily for production databases

Next Steps

Extensions — Install pg_stat_statements and other monitoring tools
Scaling — Upgrade resources if you're hitting limits
Connecting — Configure connection pooling to manage connections
Troubleshooting — Diagnose and fix common issues

On this page