PostgreSQL: Integers, on-disk order, and performance
This post examines how the on-disk order of integer data can influence performance in PostgreSQL. When working with relational databases, you often do not need to think about data storage, though there are times when these details can have a noticeable impact on your database's performance.
This post uses PostgreSQL 12.3 on Ubuntu 18.04 on a DigitalOcean droplet with 4 CPU and 8 GB RAM droplet, aka "Rig B" from Scaling osm2pgsql: Process and costs.
This post is part of the series PostgreSQL: From Idea to Database.
Table with two Integers
The table to work with in this post is created as in the following
example. The two BIGINT
columns will both store some random
integer data, with one column having the data sorted on it, and
the other column having no ordering applied. In a real production table, these integer
columns would likely be foreign keys frequently used in joins and/or filters.
The some_data
column will store some random text to add the "weight" of
pulling back data as part of this test.
CREATE TABLE integer_orders
(
ordered BIGINT NOT NULL,
not_ordered BIGINT NOT NULL,
some_data TEXT NOT NULL
);
Postgres does not have a method to automatically keep data in order on disk as new
data comes in via INSERT
or UPDATE
commands. Other database systems, such as MS SQL Server, have the concept of
clustered indexes
that can keep the data ordered on disk. In Postgres, bulk inserts can order the
data on disk by sorting the data as it is being written.
The following CTE query uses an ORDER BY
on the
ordered
column as part of the INSERT
.
The number of rows can be adjusted by altering
generate_series(1, 1000000)
.
Both the ordered
and not_ordered
columns are random numbers
created with CEILING(random() * 20000)
. This results in integers
in the range from 1 - 20,000. The ratio of unique numbers can be adjusted
to provide more or less selectivity on the values.
INSERT INTO integer_orders
WITH d AS (
SELECT CEILING(random() * 20000) AS ordered,
CEILING(random() * 20000) AS not_ordered,
repeat(md5(random()::TEXT),
ceil(random() * 10)::INT)
FROM generate_series(1, 1000000) x
)
SELECT *
FROM d
ORDER BY ordered
;
-- Ensure statistics are populated
ANALYZE integer_orders;
With the data on disk being stored in the order of the data in
the ordered
column, the results from a basic select without
an explicit ORDER BY
will often (but not guaranteed!) return all
1's in the ordered
column. The data in the not_ordered
column is in no specific
order as expected.
SELECT ordered, not_ordered, LEFT(some_data, 25) AS snippet
FROM integer_orders
LIMIT 5
;
┌─────────┬─────────────┬───────────────────────────┐
│ ordered │ not_ordered │ snippet │
╞═════════╪═════════════╪═══════════════════════════╡
│ 1 │ 10780 │ 94b23387dc624076be4aa8ddf │
│ 1 │ 3751 │ b7cfb8fdbdea4fc2f5ea0fec4 │
│ 1 │ 10066 │ c26fc1853c2e4bd64ccd264fc │
│ 1 │ 11081 │ cf0a91f262a24b06a0a8dd9c1 │
│ 1 │ 15905 │ cbcd91f08c906d58a67eec718 │
└─────────┴─────────────┴───────────────────────────┘
The final step of preparation is to add indexes on both columns. My goal with this post is to test performance so indexes will are key.
CREATE INDEX ix_integer_orders_ordered
ON integer_orders (ordered);
CREATE INDEX ix_integer_orders_not_ordered
ON integer_orders (not_ordered);
A quick aggregate to remind myself the data in the two columns is basically the same distribution of values. The averages are slightly different, but right around the midpoint of the range inserted (1 - 20,000).
SELECT AVG(ordered) AS avg_ordered, AVG(not_ordered) AS avg_not_ordered
FROM integer_orders;
┌───────────────────────┬────────────────────────┐
│ avg_ordered │ avg_not_ordered │
╞═══════════════════════╪════════════════════════╡
│ 9996.3774220000000000 │ 10004.1899420000000000 │
└───────────────────────┴────────────────────────┘
Query Plans
The first query plan we look at is from a simple SELECT
query with a filter on each
of the integer columns. This filters on a range of values.
Using EXPLAIN
(without ANALYZE
) will tell us what Postgres
thinks it will do. It does not execute the query.
EXPLAIN
SELECT *
FROM integer_orders
WHERE ordered BETWEEN 500 AND 5000
;
The resulting plan indicates the index will be used. The estimated
cost (0.42..11847.80
) does not relate to time, only the magnitude
of costs Postgres estimates for that step.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using ix_integer_orders_ordered on integer_orders (cost=0.42..11847.80 rows=222071 width=194)
Index Cond: ((ordered >= 500) AND (ordered <= 5000))
(2 rows)
Run the same query, but now using the not_ordered
column in the filter
instead of the ordered
column.
EXPLAIN
SELECT *
FROM integer_orders
WHERE not_ordered BETWEEN 500 AND 5000
;
The estimated cost (3484.21..34968.31
) is significantly higher
than the same filter on the ordered
column. Postgres is estimating
this query will take extra steps, starting with a Bitmap Index Scan
instead of the Index Scan
used previously. It also has Recheck Cond
and Bitmap Heap Scan
steps estimated as well.
QUERY PLAN
----------------------------------------------------------------------------------------------------
Bitmap Heap Scan on integer_orders (cost=3484.21..34968.31 rows=225140 width=194)
Recheck Cond: ((not_ordered >= 500) AND (not_ordered <= 5000))
-> Bitmap Index Scan on ix_integer_orders_not_ordered (cost=0.00..3427.93 rows=225140 width=0)
Index Cond: ((not_ordered >= 500) AND (not_ordered <= 5000))
(4 rows)
Query Actuals
How do those query plans translate to reality at run time? I am glad you asked!
Each query in this section was executed a minimum of three (3) times with the fastest plan being reported here. Timings mentioned are execution times unless otherwise specified.
These queries are the same queries as above, only the simple EXPLAIN
is being replaced
with EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
.
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
SELECT *
FROM integer_orders
WHERE ordered BETWEEN 500 AND 5000
;
The output now includes actual timings (Execution Time: 57.438 ms
) and
row counts (225246
). The BUFFERS
included above shows that the data was read from cache
instead of disk (Buffers: shared hit=11064
). Using SETTINGS
with EXPLAIN
is new to Postgres 12. This shows settings influencing the query planner: Settings: jit = 'off', random_page_cost = '1.9'
.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_integer_orders_ordered on public.integer_orders (cost=0.42..11847.80 rows=222071 width=194) (actual time=0.026..46.819 rows=225246 loops=1)
Output: ordered, not_ordered, some_data
Index Cond: ((integer_orders.ordered >= 500) AND (integer_orders.ordered <= 5000))
Buffers: shared hit=11064
Settings: jit = 'off', random_page_cost = '1.9'
Planning Time: 0.072 ms
Execution Time: 57.438 ms
(7 rows)
Repeating the same for the not_ordered
column.
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
SELECT *
FROM integer_orders
WHERE not_ordered BETWEEN 500 AND 5000
;
Results show all data for this query also resides in cache, though it isn't a
single cache hit.
The total amount reported read from cache is 29343
(619 + 28724),
far greater than the 11064 reported above. Last, but not least,
the timing is 106 ms vs 57 ms.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.integer_orders (cost=3484.21..34968.31 rows=225140 width=194) (actual time=26.712..95.388 rows=225285 loops=1)
Output: ordered, not_ordered, some_data
Recheck Cond: ((integer_orders.not_ordered >= 500) AND (integer_orders.not_ordered <= 5000))
Heap Blocks: exact=28105
Buffers: shared hit=28724
-> Bitmap Index Scan on ix_integer_orders_not_ordered (cost=0.00..3427.93 rows=225140 width=0) (actual time=21.649..21.649 rows=225285 loops=1)
Index Cond: ((integer_orders.not_ordered >= 500) AND (integer_orders.not_ordered <= 5000))
Buffers: shared hit=619
Settings: jit = 'off', random_page_cost = '1.9'
Planning Time: 0.080 ms
Execution Time: 106.128 ms
(11 rows)
Verdict: Ordered is faster
The filter on a range of ordered numbers was 46% faster than the same filter on unordered numbers.
How to think like the planner?
Look at the statistics! Postgres, like all relational databases, has a plethora of
information in its meta tables. The SQL Standard defined information lives in
information_schema
schema, the Postgres-flavored meta lives in the pg_catalog
schema.
The pg_catalog.pg_stats
view happens to have interesting information on this topic.
"Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk." - PostgreSQL docs
Taking a peek at the correlation.
SELECT attname, n_distinct, correlation
FROM pg_catalog.pg_stats
WHERE tablename = 'integer_orders'
;
┌─────────────┬────────────┬────────────────┐
│ attname │ n_distinct │ correlation │
╞═════════════╪════════════╪════════════════╡
│ ordered │ 19804 │ 0.9999771 │
│ not_ordered │ 20024 │ 0.0040514097 │
│ some_data │ -1 │ -0.00016416173 │
└─────────────┴────────────┴────────────────┘
The correlation
for the ordered
column (attname
) is almost 1, while
the not_ordered
column is almost 0. We saw above how the planner estimated lower
costs for the ordered
column, this lines up exactly with the expectations set by
the Postgres docs.
To find columns in your database that are either low or high in correlation use
the following query. Using @ correlation
provides the
absolute value
of the correlation so I don't need to check for positive and negative values.
SELECT tablename, attname, n_distinct,
CASE WHEN @ correlation < 0.3 THEN 'low'
WHEN @ correlation > 0.8 THEN 'high'
ELSE 'medium'
END AS correlation_desc
FROM pg_catalog.pg_stats
WHERE tablename NOT LIKE 'pg%' AND tablename NOT LIKE 'sql%'
;
┌────────────────┬─────────────┬────────────┬──────────────────┐
│ tablename │ attname │ n_distinct │ correlation_desc │
╞════════════════╪═════════════╪════════════╪══════════════════╡
│ integer_orders │ ordered │ 19886 │ high │
│ integer_orders │ not_ordered │ 19900 │ low │
│ integer_orders │ some_data │ -1 │ low │
└────────────────┴─────────────┴────────────┴──────────────────┘
See our PgDD extension for more examples of using
pg_catalog
information!
More targeted filtering
The previous examples used a relatively wide range of numbers for the filter, each query returning about 250k rows. Let's explore what happens when the query is filtered for a single number instead of a range.
The row counts for both queries are both under 50 with a single number instead of a range. The timings are faster for both queries, yet the same trends hold true.
The query with the filter on ordered
takes a mere 0.035 ms to execute, less than the planning time!
EXPLAIN (ANALYZE)
SELECT *
FROM integer_orders
WHERE ordered = 742
;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_integer_orders_ordered on integer_orders (cost=0.42..6.10 rows=50 width=194) (actual time=0.010..0.021 rows=48 loops=1)
Index Cond: (ordered = 742)
Planning Time: 0.090 ms
Execution Time: 0.035 ms
(4 rows)
The query with the filter on not_ordered
is again slower, taking 0.115 ms.
EXPLAIN (ANALYZE)
SELECT *
FROM integer_orders
WHERE not_ordered = 742
;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on integer_orders (cost=2.71..96.44 rows=50 width=194) (actual time=0.029..0.095 rows=45 loops=1)
Recheck Cond: (not_ordered = 742)
Heap Blocks: exact=45
-> Bitmap Index Scan on ix_integer_orders_not_ordered (cost=0.00..2.70 rows=50 width=0) (actual time=0.020..0.020 rows=45 loops=1)
Index Cond: (not_ordered = 742)
Planning Time: 0.066 ms
Execution Time: 0.115 ms
(7 rows)
Verdict: Ordered is faster
The filter on a range of ordered numbers was 70% faster than the same filter on unordered numbers.
Use CLUSTER
to re-order
If your data is sorted on disk by ordered
, but you want to have the data sorted
by not_ordered
, that can be done with the help of the index. PostgreSQL
has the CLUSTER
command
to physically rewrite the table.
CLUSTER integer_orders USING ix_integer_orders_not_ordered;
Warning! "Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered." - PostgreSQL Docs
If you were to run this CLUSTER
command and re-run the queries above, the timings will flip so the not_ordered
(but now ordered!) column is the faster of the two for filtering.
This operation makes the most sense if the new data coming in will continue to be ordered
on the same key. Otherwise a regular maintenance job would need to be configured to
occasionally recluster the table, and that has negative side effects best avoided if possible.
Summary
Postgres can find integers faster when the data is in order. These quick tests showed
45-70% improvements in performance by using ordered integers for filtering.
This is a good reminder of why I like using auto-incrementing BIGINT
values for
primary keys!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!