Using Query ID in Postgres 14
Postgres 14 was released
on September 30, 2021. With a new major version comes new features
to explore!
This post takes a look at the unique query id option
enabled with compute_query_id
in postgresql.conf
.
This particular backend improvement, included with Postgres 14, is one I am
excited about because it makes investigating and
monitoring query related performance easier.
This post covers how to enable the new feature and explores how it can be used
in real life performance tuning.
Enable query id
For testing I created a new instance with Postgres 14 installed
and edited the postgresql.conf
file to change a few configuration options
related to the query id.
I set compute_query_id
to on
instead of auto
and to allow the pg_stat_statements
extension to be loaded.
Additionally, I turn on log_duration
, set log_statement
to all
and update log_line_prefix
to include
query_id=%Q
,
compute_query_id = on
shared_preload_libraries = 'pg_stat_statements'
log_duration = on
log_statement = 'all'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h,query_id=%Q '
Using
pg_stat_statements
withcompute_query_id = auto
automatically enables this feature. I still choose to set the configuration explicitly toon
so there is no ambiguity of the intended state of this feature.
Save the changes to postgresql.conf
and restart the server.
After Postgres has restarted I create a database named bench_test
with the pg_stat_statements
extension.
sudo su - postgres
createdb bench_test
psql -d bench_test -c "CREATE EXTENSION pg_stat_statements;"
Exploring query_id
This feature is not interesting
with an empty database so I used
pgbench
to generate
some data and generate some initial query traffic.
pgbench -i -s 10 bench_test
pgbench -c 5 -j 1 -s 10 -T 60 bench_test
The following query uses the pg_stat_statements
view to return the 5 most
frequently ran queries. The pgbench
queries are definitely the most popular on
this otherwise unused instance, each having been ran 111,964 times.
The first column in the query is queryid
followed by the parameterized query text.
The query id has been in the pg_stat_statements
extension for quite a while,
the earliest reference I can find to this is in
Postgres 9.4.
Before now, however, there was really nothing to do with that ID outside
of pg_stat_statements
.
SELECT queryid, query, calls, total_exec_time
FROM pg_stat_statements
WHERE query NOT IN ('BEGIN', 'END')
ORDER BY calls DESC
LIMIT 5
;
┌──────────────────────┬───────────────────────────────────────────────────────────┬────────┬────────────────────┐
│ queryid │ query │ calls │ total_exec_time │
╞══════════════════════╪═══════════════════════════════════════════════════════════╪════════╪════════════════════╡
│ 1382294904069104546 │ UPDATE pgbench_branches SET bbalance = bbalance + $1 WHER…│ 111964 │ 52527.612362999775 │
│ │…E bid = $2 │ │ │
│ 6565179015390196877 │ UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE…│ 111964 │ 11572.903344999939 │
│ │… tid = $2 │ │ │
│ -2972002344259144752 │ UPDATE pgbench_accounts SET abalance = abalance + $1 WHER…│ 111964 │ 5351.522065999965 │
│ │…E aid = $2 │ │ │
│ -8632526439866242405 │ SELECT abalance FROM pgbench_accounts WHERE aid = $1 │ 111964 │ 1133.2260849999846 │
│ -3018132049962859493 │ INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)…│ 111964 │ 884.2410880000045 │
│ │… VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP) │ │ │
└──────────────────────┴───────────────────────────────────────────────────────────┴────────┴────────────────────┘
Setup for slow query
To illustrate the next part I need a long-running query. For this post I
just need time to capture the output from the pg_stat_activity
catalog table.
In real life, this opportunity presents itself naturally with slow queries,
but for this post using pgbench
generated data I need a contrived slow query.
To setup my slow query, I pick the UPDATE
query against
the pgbench_branches
table (query_id=1382294904069104546
).
The first reason I picked that query is because
it is naturally the slowest of the five (5) queries identified above.
The other reason I picked this query is it happens to be one that is quite easy
to make it run for a long time by blocking it with another transaction.
With my "slow" query picked, I need to verify that when I manually run
the query, it indeed gets the same query_id
.
UPDATE public.pgbench_branches
SET bbalance = bbalance + 1
WHERE bid = 1
;
After running the UPDATE
one time, I check pg_stat_statements
to ensure
the calls
column is incremented by one (1). Sure enough, it now reports 111,965 instead of 111,964 shown above.
SELECT queryid, calls
FROM pg_stat_statements
WHERE queryid = 1382294904069104546
;
┌─────────────────────┬────────┐
│ queryid │ calls │
╞═════════════════════╪════════╡
│ 1382294904069104546 │ 111965 │
└─────────────────────┴────────┘
Live query vs. Query history
At this point we have setup Postgres 14 to use the new query ID option, generated
some traffic against the database, and have very briefly explored the pg_stat_statements
view. To continue the scenario,
let's imagine someone is running a query that is taking longer than expected.
The "someone" could be me personally, an analyst, or an API.
To get to a slow query and observe the effect,
I will use a total of three (3) sessions (connections) to Postgres.
The first session starts an
explicit transaction and updates bid = 1
.
For this example I intentionally
leave the transaction hanging open... do not do this on production!
BEGIN TRANSACTION;
UPDATE public.pgbench_branches
SET bbalance = bbalance + 1
WHERE bid = 1
;
In the 2nd session, I run the same UPDATE
query on the same bid = 1
,
though this time I skip the explicit BEGIN TRANSACTION;
.
The second attempt to update bid = 1
will hang, blocked by the lock
the 1st session is still holding due to the explicit transaction.
While the 2nd UPDATE
query runs (waits!), the 3rd session can query
pg_catalog.pg_stat_activity
with a join
to pg_stat_statements
. This enables comparing details about the currently
running query with collected statistics. The query and results are explained after the code.
SELECT a.pid, a.query_id,
now() - a.query_start AS query_duration,
a.state,
now() - a.state_change AS state_duration,
a.wait_event,
ss.calls,
ROUND(ss.total_exec_time) AS total_exec_time,
ROUND(ss.mean_exec_time) AS mean_exec_time,
ROUND(ss.stddev_exec_time) AS stddev_exec_time
FROM pg_catalog.pg_stat_activity a
INNER JOIN pg_stat_statements ss ON a.query_id = ss.queryid
WHERE query_id = 1382294904069104546
;
┌─[ RECORD 1 ]─────┬─────────────────────┐
│ pid │ 82832 │
│ query_id │ 1382294904069104546 │
│ query_duration │ 00:00:07.98645 │
│ state │ active │
│ state_duration │ 00:00:07.986447 │
│ wait_event │ transactionid │
│ calls │ 111976 │
│ total_exec_time │ 189816 │
│ mean_exec_time │ 2 │
│ stddev_exec_time │ 252 │
└──────────────────┴─────────────────────┘
The query_duration
and state_duration
are calculated from the two related
timestamps in pg_stat_activity
.
I typically look at query_duration
first (7.986 seconds) and compare that time
to the mean_exec_time
and stddev_exec_time
, both values in milliseconds (ms).
With a mean (average) execution time at only 2 ms and a standard deviation of
252 ms (1/4 second),
seeing the currently running query at nearly 8 seconds does represent an obvious
deviation from the normal performance expectations.
The results from this query helps establish how current performance compares
against historic performance.
In EXPLAIN (VERBOSE)
The Query ID also shows up when VERBOSE
is used along with EXPLAIN
.
This will be handy when an analyst sends me a complex query asking
for assistance with performance tuning.
By using EXPLAIN (VERBOSE)
I can quickly get the Query Identifier
needed
to find the related statistics in pg_stat_statments
. The following
example also sets COSTS OFF
to simplify the output.
EXPLAIN (VERBOSE, COSTS OFF)
UPDATE public.pgbench_branches
SET bbalance = bbalance + 1
WHERE bid = 1
;
┌─────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
╞═════════════════════════════════════════════════════════════════════════╡
│ Update on public.pgbench_branches │
│ -> Index Scan using pgbench_branches_pkey on public.pgbench_branches │
│ Output: (bbalance + 1), ctid │
│ Index Cond: (pgbench_branches.bid = 1) │
│ Query Identifier: 1382294904069104546 │
└─────────────────────────────────────────────────────────────────────────┘
One reality with performance tuning is some queries do not return results in
a reasonable time, taking hours or longer to complete. Using EXPLAIN
without ANALYZE
provides a way to get the query_id
without having
to wait for a query to complete or adding significant load to the instance.
In log files
Having found the query_id
in pg_stat_activity
and in the EXPLAIN
output,
that leaves the log files to explore.
With log_duration = on
and log_line_prefix
updated as shown above,
output similar to the following will come through the log files.
2021-10-14 00:24:12 UTC [73040]: [90-1] user=postgres,db=bench_test,app=psql,client=[local],query_id=1382294904069104546 LOG: duration: 1.105 ms
To see logs associated with a particular query I can pipe the cat
output
to grep
looking for a specific query_id
value. In this example -m 5
is used to limit the output to 5 records. This gives us a way to
look at individual timings that we can also compare to the aggregated statistics
in pg_stat_statements
.
cat /var/log/postgresql/postgresql-14-main.log \
| grep -m 5 query_id=1382294904069104546
2021-10-13 20:34:38 UTC [73054]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG: duration: 0.173 ms
2021-10-13 20:34:38 UTC [73055]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG: duration: 0.246 ms
2021-10-13 20:34:38 UTC [73052]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG: duration: 0.263 ms
2021-10-13 20:34:38 UTC [73053]: [13-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG: duration: 0.263 ms
2021-10-13 20:34:38 UTC [73052]: [27-1] user=postgres,db=bench_test,app=pgbench,client=[local],query_id=1382294904069104546 LOG: duration: 0.221 ms
One caveat
I encountered one caveat worth being aware of related to the log output.
One of the logging related configuration changes I made
was to set log_statement = 'all'
. This results in the query text being logged,
but unfortunately the query_id
does not get set.
The following output shows two log entries, the first is the result of log_statement
reporting query_id=0
, the second is the log output that was shown previously
that includes the query_id
.
2021-10-14 00:24:125 UTC [73040]: [91-1] user=postgres,db=bench_test,app=psql,client=[local],query_id=0 LOG: statement: UPDATE public.pgbench_branches
SET bbalance = bbalance + 1
WHERE bid = 1
;
2021-10-14 00:24:12 UTC [73040]: [92-1] user=postgres,db=bench_test,app=psql,client=[local],query_id=1382294904069104546 LOG: duration: 1.114 ms
This nuance and limitation was explained by Julien Rouhaud in the discussion for this feature :
"The reason for that is that
log_statement
is done right after parsing the query...As parse analysis is not yet done, no
queryid
can be computed at that point, so we always print 0. That's a limitation that can't be removed without changing the semantics oflog_statement
, so we'll probably have to live with it."
Summary
The new query_id
available throughout core components of Postgres 14
is a great addition. While backend improvements like this are not
as exciting as some of the other improvements
(hello multi-range data types 😍),
every advantage I can get related to troubleshooting query performance
is a benefit worth having. The ability to link to pg_stat_statements
is
welcome since I already use that extension by default. The documentation
hints that other extensions can use this new ability and define custom
hashing methods for determining query_id
, though I have
not explored those options yet.
I was initially disappointed about the limitation related to the
log_statement
output.
After working through the examples for this post and thinking about
my workflow, I do not think it matters.
I am able to get the query_id
from enough sources that I can tie the important
pieces together even with that limitation.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!