Monitoring
Real-time metrics, dashboards, and query insights for your Zektor.io database instances.
Overview
Zektor.io provides built-in monitoring for all database instances. Track performance metrics, query patterns, and resource utilization directly from your dashboard — no external tools required.
Instance Dashboard
Every instance has a dedicated dashboard showing real-time and historical metrics at a glance:
- Status — Current instance health (Active, Provisioning, Stopped)
- Uptime — How long your instance has been running
- Connection count — Current active connections
- Resource utilization — CPU, memory, and storage usage
Reading the Dashboard Charts
The dashboard displays time-series charts for key metrics. Here's what to look for:
| Chart | Healthy Range | Warning Signs |
|---|---|---|
| CPU Usage | Below 70% average | Sustained >80% indicates need to scale up |
| Memory Usage | Below 80% | Sustained >90% may cause OOM or swapping |
| Disk Usage | Below 70% | Above 80% triggers auto-scaling (if enabled) |
| Active Connections | Well below max_connections | Approaching limit causes connection failures |
Metrics
PostgreSQL Metrics
Navigate to the Metrics tab of your PostgreSQL instance to view:
| Metric | Description |
|---|---|
| CPU Usage | Percentage of allocated CPU being used |
| Memory Usage | RAM utilization including shared buffers and cache |
| Disk Usage | SSD storage consumed vs. allocated |
| WAL Storage | Write-ahead log storage usage (important for PITR) |
| Active Connections | Number of currently connected clients |
| Transactions/sec | Rate of committed transactions |
| Rows Read/Written | Row-level I/O activity |
| Replication Lag | Delay between primary and replica (if applicable) |
Valkey Metrics
For Valkey (cache) instances:
| Metric | Description |
|---|---|
| CPU Usage | CPU utilization |
| Memory Usage | Used memory vs. allocated |
| Connected Clients | Active client connections |
| Commands/sec | Rate of executed commands |
| Hit Rate | Cache hit ratio (higher is better — aim for >90%) |
| Evicted Keys | Number of evicted keys due to memory pressure |
| Keyspace Size | Total number of keys stored |
Common Issues and What to Check
Use this table to quickly diagnose problems:
| Symptom | Metrics to Check | Likely Cause | Action |
|---|---|---|---|
| Slow queries | CPU, Rows Read | Missing indexes, full table scans | Add indexes, optimize queries |
| Connection errors | Active Connections | Hitting max_connections | Use pooling, scale up |
| Application timeouts | CPU, Memory | Resource saturation | Scale up tier |
| Growing disk usage | Disk Usage, WAL Storage | Data growth, WAL accumulation | Enable auto-scaling, clean up |
| High cache miss rate | Hit Rate, Memory | Working set exceeds RAM | Scale up, review TTLs |
| Evicted keys (Valkey) | Evicted Keys, Memory | Memory pressure | Scale up, reduce key count |
Query Insights (PostgreSQL)
Using pg_stat_statements
Enable the pg_stat_statements extension to track query performance:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;Then inspect your most expensive queries:
-- Top 10 queries by total execution time
SELECT
query,
calls,
round(total_exec_time::numeric, 2) AS total_time_ms,
round(mean_exec_time::numeric, 2) AS avg_time_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Finding Slow Queries
-- Queries with average execution time > 100ms
SELECT
query,
calls,
round(mean_exec_time::numeric, 2) AS avg_time_ms
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;Finding Frequently Called Queries
-- Most frequently called queries
SELECT
query,
calls,
round(mean_exec_time::numeric, 2) AS avg_time_ms,
rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;Finding Missing Indexes
Identify tables where sequential scans dominate over index scans:
SELECT
relname AS table,
seq_scan,
idx_scan,
round(100.0 * seq_scan / NULLIF(seq_scan + idx_scan, 0), 1) AS seq_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC
LIMIT 20;Tables with a high seq_pct (>50%) and many rows likely need indexes.
Resetting Statistics
SELECT pg_stat_statements_reset();Monitoring Connections
Active Connections
SELECT
datname,
usename,
application_name,
client_addr,
state,
query_start,
query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;Connection Count by State
SELECT
state,
count(*)
FROM pg_stat_activity
GROUP BY state;Database Size Monitoring
Check Database Size
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;Check Table Sizes
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;Monitor WAL Storage
WAL storage is critical for point-in-time recovery. If WAL storage grows unexpectedly:
- High write volume generates more WAL
- Long backup retention keeps more WAL segments
- Check for long-running transactions preventing WAL cleanup:
SELECT pid, state, query_start, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY query_start LIMIT 10;
Alerting
Currently, Zektor.io does not provide built-in alerting. We recommend:
- Monitoring your dashboard regularly for usage spikes
- Setting up external monitoring with tools like Grafana, Datadog, or custom scripts that query
pg_stat_statements - Watching disk usage to avoid running out of storage
Built-in alerting is on our roadmap.
Best Practices
- Enable pg_stat_statements on all production databases to track query performance
- Monitor disk usage regularly — auto-scaling handles growth but sudden spikes warrant investigation
- Watch connection counts — approaching
max_connectionscan cause connection failures - Review slow queries weekly to identify optimization opportunities
- Track WAL storage — unexpectedly high WAL usage may indicate issues
- Set up a monitoring routine — Check your dashboard at least daily for production databases
Next Steps
- Extensions — Install pg_stat_statements and other monitoring tools
- Scaling — Upgrade resources if you're hitting limits
- Connecting — Configure connection pooling to manage connections
- Troubleshooting — Diagnose and fix common issues