My book Mastering PostGIS and OpenStreetMap is available!

Track performance differences with pg_stat_statements

By Ryan Lambert -- Published May 06, 2023

This is my entry for PgSQL Phriday #008. It's Saturday, so I guess this is a day late! This month's topic, chosen by Michael from pgMustard, is on the excellent pg_stat_statements extension. When I saw Michael was the host this month I knew he'd pick a topic I would want to contribute on! Michael's post for his own topic provides helpful queries and good reminders about changes to columns between Postgres version 12 and 13.

In this post I show one way I like using pg_stat_statements: tracking the impact of configuration changes to a specific workload. I used a contrived change to configuration to quickly make an obvious impact.

Process to test

I am using PgOSM Flex to load Colorado OpenStreetMap data to PostGIS. PgOSM Flex uses a multi-step ETL that prepares the database, runs osm2pgsql, and then runs multiple post-processing steps. This results in 2.4 GB of data into Postgres. That should be enough activity to show something interesting.

Configuration changes

Processes like PgOSM Flex are sensitive to a few configuration options in postgresql.conf. The first configuration change I though of was maintenance_work_mem. While I was in there I decided to drop shared_buffers and work_mem to much lower values too. Just to be sure my tests showed a difference!

My local Postgres 15 instance has these settings for my daily use. This is Configuration A for this post.

shared_buffers = 4GB
work_mem = 10MB 
maintenance_work_mem = 1GB

My comparison, Configuration B, dropped each of those settings to unbearably low values. I imagine setting maintenance_work_mem = 1MB was sufficient for showing a difference.

shared_buffers = 4MB
work_mem = 1MB 
maintenance_work_mem = 1MB

Clean database

Before running each test I setup a clean database with the prerequisites and reset pg_stat_statements for a clean slate.

SELECT pg_stat_statements_reset();

At this point I run the process and wait. While I'm testing PgOSM Flex, you can test whatever you like as long as you have a scripted, reproducible processes in a controlled environment.

The following query is ran after the test run completes. This calcualtes a query_type column with some typical groupings. Some of the query_type values are project specific (e.g. WHEN query ILIKE '%CALL osm%' THEN 'PgOSM Flex processing'). I put this into a temp table so I can further query and manipulate it. This type of querying is often fairly ad-hoc, specific to the database in question.

DROP TABLE IF EXISTS queries;
CREATE TABLE queries AS
SELECT CASE WHEN query ILIKE '%COMMENT ON%'
                    OR query ILIKE 'BEGIN'
                    OR query ILIKE 'SET%'
                    OR query ILIKE '%COMMIT%'
                    OR query ILIKE '%ROLLBACK%'
                    OR query ILIKE '%SHOW%'
                    OR query ILIKE '%LOCK%TABLE%'
                THEN 'Controls/Docs'
            WHEN query ILIKE '%UPDATE%'
                OR query ILIKE '%DELETE%'
                THEN 'Update / Delete'
            WHEN query ILIKE '%COPY%'
                OR query ILIKE '%INSERT%'
                THEN 'Data Load'
            WHEN query ILIKE '%CREATE%MAT%VIEW%'
                THEN 'Create MV'
            WHEN query ILIKE '%CREATE%TABLE%'
                    OR query ILIKE '%CREATE%FUNCTION%'
                    OR query ILIKE '%CREATE%SCHEMA%'
                    OR query ILIKE '%ALTER%TABLE%'
                    OR query ILIKE '%CREATE%VIEW%'
                    OR query ILIKE '%CREATE%EXTENSION%'
                THEN 'DDL'
            WHEN query ILIKE '%DROP%TABLE%'
                    OR query ILIKE '%DROP%TRIGGER%'
                    OR query ILIKE '%DROP%FUNCTION%'
                    OR query ILIKE '%DROP%PROCEDURE%'
                THEN 'Drop objects'
            WHEN query ILIKE '%CALL osm%'
                THEN 'PgOSM Flex processing'
            WHEN query ILIKE '%CREATE%INDEX%'
                THEN 'Create Index'
            WHEN query ILIKE 'ANALYZE%'
                THEN 'Analyze (Stats)'
            WHEN query ILIKE '%SELECT%'
                THEN 'SELECT'
            ELSE 'Unknown'
            END AS query_type,
        *
    FROM pg_stat_statements
    ORDER BY mean_exec_time DESC
;

From this temp table I aggregate the data into a non-temp table so I can later compare results. For this example I create a stats schema to store the aggregated results.

CREATE SCHEMA stats;

The following query aggregates a few common values by query_type. The filter on toplevel is optional, I often include it to reduce noise. Other times I leave it out so I can see a more detailed picture of what's happening.

CREATE TABLE stats.agg_config_a AS
SELECT query_type, COUNT(*), COUNT(DISTINCT queryid) AS distinct_query_count,
        COUNT(*) FILTER (WHERE plans > 0) AS queries_with_plans,
        SUM(plans) AS total_plans,
        SUM(calls) AS total_calls,
        SUM(total_exec_time) AS total_exec_time,
        MAX(max_exec_time) AS max_exec_time,
        SUM(rows) AS total_rows,
        SUM(shared_blks_dirtied) AS shared_blks_dirtied,
        SUM(temp_blks_written) AS temp_blks_written,
        SUM(temp_blk_read_time) AS temp_blk_read_time,
        SUM(temp_blk_write_time) AS temp_blk_write_time,
        SUM(wal_records) AS wal_records,
        pg_size_pretty(SUM(wal_bytes)) AS wal_size
    FROM queries
    WHERE toplevel
    GROUP BY query_type
    ORDER BY total_exec_time DESC
;

After both tests are saved

I ran each test saving the results from pg_stat_statements into stats.agg_config_a and stats.agg_config_b. The following query compares the results of the two tests.

SELECT a.query_type, a.total_calls,
        a.total_exec_time::BIGINT AS total_time_a,
        b.total_exec_time::BIGINT AS total_time_b,
        (b.total_exec_time - a.total_exec_time)::BIGINT AS time_diff_ms,
        (b.total_exec_time - a.total_exec_time) / a.total_exec_time
            AS percent_diff
    FROM stats.agg_config_a a
    FULL JOIN stats.agg_config_b b
        ON a.query_type = b.query_type
    ORDER BY time_diff_ms DESC
;

┌───────────────────────┬─────────────┬──────────────┬──────────────┬──────────────┬──────────────────────┐
│      query_type       │ total_calls │ total_time_a │ total_time_b │ time_diff_ms │     percent_diff     │
╞═══════════════════════╪═════════════╪══════════════╪══════════════╪══════════════╪══════════════════════╡
│ Data Load             │         245 │       358854 │       393957 │        35103 │  0.09782046502939316 │
│ Create Index          │         119 │        11929 │        17155 │         5225 │  0.43799236073983533 │
│ DDL                   │         202 │         7745 │        12279 │         4534 │   0.5854177939107483 │
│ Analyze (Stats)       │          39 │         3373 │         4878 │         1505 │   0.4462918308554119 │
│ Create MV             │           3 │         1586 │         1740 │          154 │  0.09696164745742934 │
│ PgOSM Flex processing │           2 │         1413 │         1512 │           99 │  0.07015269212804308 │
│ Update / Delete       │          39 │           46 │           68 │           22 │  0.48861133692158826 │
│ Drop objects          │         166 │          118 │          135 │           17 │   0.1403158906113987 │
│ SELECT                │          30 │           65 │           76 │           11 │  0.17006003902495692 │
│ Controls/Docs         │         634 │           38 │           34 │           -5 │ -0.12137879643848994 │
└───────────────────────┴─────────────┴──────────────┴──────────────┴──────────────┴──────────────────────┘

Visuals

These results can be put into charts easily enough. The first chart shows % difference between timings of query_type. This view of the data does not consider the impact on the total time, only showing the impact by type of operation.

Chart showing the % difference in timings.

The next chart visualizes the total differences in time by query_type. While the above chart shows that data loading (COPY/INSERT) was 10% faster with Configuration A, the following chart shows that 10% difference amounted 35 seconds. On the other hand, the UPDATE/DELETE shows a 50% difference when the total impact was 22 milliseconds.

Chart showing the % difference in timings.

Summary

This post showed one way I use pg_stat_statements to track performance in PostgreSQL databases. This basic approach can be used and adjusted for a wide variety of scenarios. Thanks again Michael for hosting this month and the great topic!

My #pgsqlphriday category is getting quite the collection going!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published May 06, 2023
Last Updated May 06, 2023