Grafana dashboards for pgSCV.

A week ago I announced pgSCV — a new metrics exporter for PostgreSQL. After that, some people asked me about dashboards — it will be nice to provide dashboards for pgSCV. I had a task to make dashboards in my todo list, and the request for dashboards didn’t surprise me. This is obvious, pgCSV provides a lot of metrics, and even seasoned DBAs need a lot of time to understand all of that and then make a dashboard.

I’ve made and published dashboards which cover the most metrics from pgSCV, but not the all. In this post I am going to make a quick review and tell what is inside the dashboards.

I would like to emphasize, these are initial versions of dashboards, they will be extended or changed in the future. Now there are three dashboards, because pgSCV can expose metrics about PostgreSQL, Pgbouncer and operating system. In next releases of pgSCV it is planned to collect metrics about other Postgres-related tools, like pgBackrest, Patroni and similar, hence the list of dashboards will also be extended.

I would like to emphasize, these are initial versions of dashboards, they will be extended or changed in the future. Now there are three dashboards, because pgSCV can expose metrics about PostgreSQL, Pgbouncer and operating system. In next releases of pgSCV it is planned to collect metrics about other Postgres-related tools, like pgBackrest, Patroni and similar, hence the list of dashboards will also be extended.

As it follows from title, all dashboards are made for Grafana. I am quite familiar with Grafana, and IMHO this is a well known and widely used tool for visualizing metrics. All created dashboards use Prometheus data source. But, to be honest, I made them using the Victoriametrics storage, but all queries are written using PromQL dialect. Perhaps, later I will make extra dashboards with MetricsQL.

Currently, three dashboards are available:
- PostgreSQL
- Pgbouncer
- System

Overview — this is a panel with RED metrics — requests, errors, durations — from Postgres point of view it means number of processed statements (queries), number of errors in the logs and average duration of successfully executed statements. Taking a look at this panel it is possible to understand is there any problems with processed traffic.

Activity contains charts about established connections and transactions activity. Such charts help to see bad clients with harmful activity which unwanted for Postgres.

Logs are about messages written to log files. Charts help to see when errors and other important messages occur, and also track total size of log durectory.

Statements — it contains topN charts about executed statements (based on pg_stat_statements metrics). IMHO, these are the most useful and important charts for DBA — it helps to see top-most queries which use a lot of compute or IO resources.

Logs and part of statements charts

Locks and Wait Events are about lock activity and catched wait events. These charts mostly helpful at post-incident debugging and useful for searching long locking issues.

Locks and Wait Events

Replication — it about streaming replication and connected standbys. These charts help to track replication lag between primary and standby nodes.

WAL — contains charts which help to answer on question how much WAL is generated and size of WAL directory.

Vacuum Maintenance is about autovacuum — how many workers are running and what tables are most vacuumed. Also chart about how many transactions left before XID wraparound. These charts help to see autovacuum bottlenecks and understand is autovacuum configured properly.

Background Writes is about background services — checkpointer, bgwriter. But there is also info about how much WAL and temporary files are written. Mostly these charts helps to answer on question how much data are written by Postgres.

Disk space usage is similar to system charts, but tells about space usage from Postgres point of view. Here is charts about space utilization by databases, tablespaces, tables, indexes and temporary files. It helps to understand how and by which objects space is used.

Tables is about running workload — here are charts about top tables heavily used by INSERT/UPDATE/DELETE commands. It is also regularly used charts which help to understand how workloads is changing — this is useful to check after applications deploys.

WAL Archiving is useful when archiving is configured. It shows state of archiving — success and failures and archiving lag.

System resources usage — here is brief overview about CPU and storage utilization.

Huh, there are 37 hidden charts.

Utilization is about how pools are used and is their capacity is sufficient. Using this charts you can answer on question are the pools configured properly.

Connections are about client and server connections. These charts help to see in details what connections are established and their states.

Processing contains charts about how transactions and queries are processed, about processed network traffic, and about waiting clients. These charts are useful for check performance degradation. I really want the same metrics in PostgreSQL.

Here is a seasoned DBA will not find any new stuff — this dashboard cover only necessary thing which helps to make a quick overview of system health. For deeper analyzing it is always possible to visit Grafana’s Explore and make ad-hoc query.

Overview — brief summary about resources usage — CPU, storage, network, memory and filesystems — no more.

CPU, Storage utilization here are charts about how CPUs and block devices are utilized — these are the most important resources in database lifecycle.

Storage is about block devices — latency and throughput in IOPS and bytes (including discards and flushes which added in recent Linux kernels).

Network contains charts related to network interfaces — their bandwidth usage and errors occurred.

Memory, swap — here are classic charts about how memory and swap are used.

Disk space — charts about disk space usage on mounted filesystems.

Well, that is it. As I mentioned earlier, this is initial version of dashboards, and I am sure they will be extended in a future.

If you have any suggestions, let me know. Thank you for reading.

PostgreSQL DBA