RustProof Labs: blogging for education (logo)

Using Query ID in Postgres 14

By Ryan Lambert -- Published October 15, 2021

Postgres 14 was released on September 30, 2021. With a new major version comes new features to explore! This post takes a look at the unique query id option enabled with compute_query_id in postgresql.conf. This particular backend improvement, included with Postgres 14, is one I am excited about because it makes investigating and monitoring query related performance easier. This post covers how to enable the new feature and explores how it can be used in real life performance tuning.

Enable query id

For testing I created a new instance with Postgres 14 installed and edited the postgresql.conf file to change a few configuration options related to the query id. I set compute_query_id to on instead of auto and to allow the pg_stat_statements extension to be loaded. Additionally, I turn on log_duration, set log_statement to all and update log_line_prefix to include query_id=%Q,

compute_query_id = on
shared_preload_libraries = 'pg_stat_statements'

log_duration = on
log_statement = 'all'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h,query_id=%Q '

Using pg_stat_statements with compute_query_id = auto automatically enables this feature. I still choose to set the configuration explicitly to on so there is no ambiguity of the intended state of this feature.

Save the changes to postgresql.conf and restart the server. After Postgres has restarted I create a database named bench_test with the pg_stat_statements extension.

sudo su - postgres
createdb bench_test
psql -d bench_test -c "CREATE EXTENSION pg_stat_statements;"

Exploring query_id

This feature is not interesting with an empty database so I used pgbench to generate some data and generate some initial query traffic.

pgbench -i -s 10 bench_test
pgbench -c 5 -j 1 -s 10 -T 60 bench_test

The following query uses the pg_stat_statements view to return the 5 most frequently ran queries. The pgbench queries are definitely the most popular on this otherwise unused instance, each having been ran 111,964 times. The first column in the query is queryid followed by the parameterized query text. The query id has been in the pg_stat_statements extension for quite a while, the earliest reference I can find to this is in Postgres 9.4. Before now, however, there was really nothing to do with that ID outside of pg_stat_statements.

SELECT queryid, query, calls, total_exec_time
    FROM pg_stat_statements
    WHERE query NOT IN ('BEGIN', 'END')
    ORDER BY calls DESC
    LIMIT 5
;

┌──────────────────────┬───────────────────────────────────────────────────────────┬────────┬────────────────────┐
│       queryid        │                           query                           │ calls  │  total_exec_time   │
╞══════════════════════╪═══════════════════════════════════════════════════════════╪════════╪════════════════════╡
│  1382294904069104546 │ UPDATE pgbench_branches SET bbalance = bbalance + $1 WHER…│ 111964 │ 52527.612362999775 │
│                      │…E bid = $2                                                │        │                    │
│  6565179015390196877 │ UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE…│ 111964 │ 11572.903344999939 │
│                      │… tid = $2                                                 │        │                    │
│ -2972002344259144752 │ UPDATE pgbench_accounts SET abalance = abalance + $1 WHER…│ 111964 │  5351.522065999965 │
│                      │…E aid = $2                                                │        │                    │
│ -8632526439866242405 │ SELECT abalance FROM pgbench_accounts WHERE aid = $1      │ 111964 │ 1133.2260849999846 │
│ -3018132049962859493 │ INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)…│ 111964 │  884.2410880000045 │
│                      │… VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP)               │        │                    │
└──────────────────────┴───────────────────────────────────────────────────────────┴────────┴────────────────────┘

Setup for slow query

To illustrate the next part I need a long-running query. For this post I just need time to capture the output from the pg_stat_activity catalog table. In real life, this opportunity presents itself naturally with slow queries, but for this post using pgbench generated data I need a contrived slow query.

To setup my slow query, I pick the UPDATE query against the pgbench_branches table (query_id=1382294904069104546). The first reason I picked that query is because it is naturally the slowest of the five (5) queries identified above. The other reason I picked this query is it happens to be one that is quite easy to make it run for a long time by blocking it with another transaction.

With my "slow" query picked, I need to verify that when I manually run the query, it indeed gets the same query_id.

UPDATE public.pgbench_branches
    SET bbalance = bbalance + 1
    WHERE bid = 1
;

After running the UPDATE one time, I check pg_stat_statements to ensure the calls column is incremented by one (1). Sure enough, it now reports 111,965 instead of 111,964 shown above.

SELECT queryid, calls
    FROM pg_stat_statements
    WHERE queryid = 1382294904069104546
;

┌─────────────────────┬────────┐
│       queryid       │ calls  │
╞═════════════════════╪════════╡
│ 1382294904069104546 │ 111965 │
└─────────────────────┴────────┘

Live query vs. Query history

At this point we have setup Postgres 14 to use the new query ID option, generated some traffic against the database, and have very briefly explored the pg_stat_statements view. To continue the scenario, let's imagine someone is running a query that is taking longer than expected. The "someone" could be me personally, an analyst, or an API.

To get to a slow query and observe the effect, I will use a total of three (3) sessions (connections) to Postgres. The first session starts an explicit transaction and updates bid = 1. For this example I intentionally leave the transaction hanging open... do not do this on production!

BEGIN TRANSACTION;
UPDATE public.pgbench_branches
    SET bbalance = bbalance + 1
    WHERE bid = 1
;

In the 2nd session, I run the same UPDATE query on the same bid = 1, though this time I skip the explicit BEGIN TRANSACTION;. The second attempt to update bid = 1 will hang, blocked by the lock the 1st session is still holding due to the explicit transaction.

While the 2nd UPDATE query runs (waits!), the 3rd session can query pg_catalog.pg_stat_activity with a join to pg_stat_statements. This enables comparing details about the currently running query with collected statistics. The query and results are explained after the code.

SELECT a.pid, a.query_id,
        now() - a.query_start AS query_duration,
        a.state,
        now() - a.state_change AS state_duration,
        a.wait_event,
        ss.calls,
        ROUND(ss.total_exec_time) AS total_exec_time,
        ROUND(ss.mean_exec_time) AS mean_exec_time,
        ROUND(ss.stddev_exec_time) AS stddev_exec_time
    FROM pg_catalog.pg_stat_activity a
    INNER JOIN pg_stat_statements ss ON a.query_id = ss.queryid
    WHERE query_id = 1382294904069104546
;

┌─[ RECORD 1 ]─────┬─────────────────────┐
│ pid              │ 82832               │
│ query_id         │ 1382294904069104546 │
│ query_duration   │ 00:00:07.98645      │
│ state            │ active              │
│ state_duration   │ 00:00:07.986447     │
│ wait_event       │ transactionid       │
│ calls            │ 111976              │
│ total_exec_time  │ 189816              │
│ mean_exec_time   │ 2                   │
│ stddev_exec_time │ 252                 │
└──────────────────┴─────────────────────┘

The query_duration and state_duration are calculated from the two related timestamps in pg_stat_activity. I typically look at query_duration first (7.986 seconds) and compare that time to the mean_exec_time and stddev_exec_time, both values in milliseconds (ms). With a mean (average) execution time at only 2 ms and a standard deviation of 252 ms (1/4 second), seeing the currently running query at nearly 8 seconds does represent an obvious deviation from the normal performance expectations. The results from this query helps establish how current performance compares against historic performance.

In EXPLAIN (VERBOSE)

The Query ID also shows up when VERBOSE is used along with EXPLAIN. This will be handy when an analyst sends me a complex query asking for assistance with performance tuning. By using EXPLAIN (VERBOSE) I can quickly get the Query Identifier needed to find the related statistics in pg_stat_statments. The following example also sets COSTS OFF to simplify the output.

EXPLAIN (VERBOSE, COSTS OFF)
UPDATE public.pgbench_branches
        SET bbalance = bbalance + 1
        WHERE bid = 1
    ;

┌─────────────────────────────────────────────────────────────────────────┐
│                               QUERY PLAN                                │
╞═════════════════════════════════════════════════════════════════════════╡
│ Update on public.pgbench_branches                                       │
│   ->  Index Scan using pgbench_branches_pkey on public.pgbench_branches │
│         Output: (bbalance + 1), ctid                                    │
│         Index Cond: (pgbench_branches.bid = 1)                          │
│ Query Identifier: 1382294904069104546                                   │
└─────────────────────────────────────────────────────────────────────────┘

One reality with performance tuning is some queries do not return results in a reasonable time, taking hours or longer to complete. Using EXPLAIN without ANALYZE provides a way to get the query_id without having to wait for a query to complete or adding significant load to the instance.

In log files

Having found the query_id in pg_stat_activity and in the EXPLAIN output, that leaves the log files to explore. With log_duration = on and log_line_prefix updated as shown above, output similar to the following will come through the log files.

2021-10-14 00:24:12 UTC [73040]: [90-1] user=postgres,db=bench_test,app=psql,client=[local],query_id=1382294904069104546 LOG:  duration: 1.105 ms

To see logs associated with a particular query I can pipe the cat output to grep looking for a specific query_id value. In this example -m 5 is used to limit the output to 5 records. This gives us a way to look at individual timings that we can also compare to the aggregated statistics in pg_stat_statements.

cat /var/log/postgresql/postgresql-14-main.log \
    | grep -m 5 query_id=1382294904069104546

2021-10-13 20:34:38 UTC [73054]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG:  duration: 0.173 ms
2021-10-13 20:34:38 UTC [73055]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG:  duration: 0.246 ms
2021-10-13 20:34:38 UTC [73052]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG:  duration: 0.263 ms
2021-10-13 20:34:38 UTC [73053]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG:  duration: 0.263 ms
2021-10-13 20:34:38 UTC [73052]: [27-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG:  duration: 0.221 ms

One caveat

I encountered one caveat worth being aware of related to the log output. One of the logging related configuration changes I made was to set log_statement = 'all'. This results in the query text being logged, but unfortunately the query_id does not get set. The following output shows two log entries, the first is the result of log_statement reporting query_id=0, the second is the log output that was shown previously that includes the query_id.

2021-10-14 00:24:125 UTC [73040]: [91-1] user=postgres,db=bench_test,app=psql,client=[local],query_id=0 LOG:  statement: UPDATE public.pgbench_branches
        SET bbalance = bbalance + 1
        WHERE bid = 1
    ;
2021-10-14 00:24:12 UTC [73040]: [92-1] user=postgres,db=bench_test,app=psql,client=[local],query_id=1382294904069104546 LOG:  duration: 1.114 ms

This nuance and limitation was explained by Julien Rouhaud in the discussion for this feature :

"The reason for that is that log_statement is done right after parsing the query...

As parse analysis is not yet done, no queryid can be computed at that point, so we always print 0. That's a limitation that can't be removed without changing the semantics of log_statement, so we'll probably have to live with it."

Summary

The new query_id available throughout core components of Postgres 14 is a great addition. While backend improvements like this are not as exciting as some of the other improvements (hello multi-range data types 😍), every advantage I can get related to troubleshooting query performance is a benefit worth having. The ability to link to pg_stat_statements is welcome since I already use that extension by default. The documentation hints that other extensions can use this new ability and define custom hashing methods for determining query_id, though I have not explored those options yet.

I was initially disappointed about the limitation related to the log_statement output. After working through the examples for this post and thinking about my workflow, I do not think it matters. I am able to get the query_id from enough sources that I can tie the important pieces together even with that limitation.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published October 15, 2021
Last Updated October 15, 2021