RustProof Labs: blogging for education (logo)
My book Mastering PostGIS and OpenStreetMap is available!

PostgreSQL: Integers, on-disk order, and performance

By Ryan Lambert -- Published August 05, 2020

This post examines how the on-disk order of integer data can influence performance in PostgreSQL. When working with relational databases, you often do not need to think about data storage, though there are times when these details can have a noticeable impact on your database's performance.

This post uses PostgreSQL 12.3 on Ubuntu 18.04 on a DigitalOcean droplet with 4 CPU and 8 GB RAM droplet, aka "Rig B" from Scaling osm2pgsql: Process and costs.

This post is part of the series PostgreSQL: From Idea to Database.

Table with two Integers

The table to work with in this post is created as in the following example. The two BIGINT columns will both store some random integer data, with one column having the data sorted on it, and the other column having no ordering applied. In a real production table, these integer columns would likely be foreign keys frequently used in joins and/or filters. The some_data column will store some random text to add the "weight" of pulling back data as part of this test.

CREATE TABLE integer_orders
(
    ordered BIGINT NOT NULL,
    not_ordered BIGINT NOT NULL,
    some_data TEXT NOT NULL
);

Postgres does not have a method to automatically keep data in order on disk as new data comes in via INSERT or UPDATE commands. Other database systems, such as MS SQL Server, have the concept of clustered indexes that can keep the data ordered on disk. In Postgres, bulk inserts can order the data on disk by sorting the data as it is being written.

The following CTE query uses an ORDER BY on the ordered column as part of the INSERT. The number of rows can be adjusted by altering generate_series(1, 1000000). Both the ordered and not_ordered columns are random numbers created with CEILING(random() * 20000). This results in integers in the range from 1 - 20,000. The ratio of unique numbers can be adjusted to provide more or less selectivity on the values.

INSERT INTO integer_orders
WITH d AS (
SELECT CEILING(random() * 20000) AS ordered,
        CEILING(random() * 20000) AS not_ordered,
        repeat(md5(random()::TEXT),
            ceil(random() * 10)::INT)
    FROM generate_series(1, 1000000) x
)
SELECT *
    FROM d 
    ORDER BY ordered
;

-- Ensure statistics are populated
ANALYZE integer_orders;

With the data on disk being stored in the order of the data in the ordered column, the results from a basic select without an explicit ORDER BY will often (but not guaranteed!) return all 1's in the ordered column. The data in the not_ordered column is in no specific order as expected.

SELECT ordered, not_ordered, LEFT(some_data, 25) AS snippet
    FROM integer_orders
    LIMIT 5
;

┌─────────┬─────────────┬───────────────────────────┐
│ ordered │ not_ordered │          snippet          │
╞═════════╪═════════════╪═══════════════════════════╡
│       1 │       10780 │ 94b23387dc624076be4aa8ddf │
│       1 │        3751 │ b7cfb8fdbdea4fc2f5ea0fec4 │
│       1 │       10066 │ c26fc1853c2e4bd64ccd264fc │
│       1 │       11081 │ cf0a91f262a24b06a0a8dd9c1 │
│       1 │       15905 │ cbcd91f08c906d58a67eec718 │
└─────────┴─────────────┴───────────────────────────┘

The final step of preparation is to add indexes on both columns. My goal with this post is to test performance so indexes will are key.

CREATE INDEX ix_integer_orders_ordered
    ON integer_orders (ordered);
CREATE INDEX ix_integer_orders_not_ordered
    ON integer_orders (not_ordered);

A quick aggregate to remind myself the data in the two columns is basically the same distribution of values. The averages are slightly different, but right around the midpoint of the range inserted (1 - 20,000).

SELECT AVG(ordered) AS avg_ordered, AVG(not_ordered) AS avg_not_ordered
    FROM integer_orders;

┌───────────────────────┬────────────────────────┐
│      avg_ordered      │    avg_not_ordered     │
╞═══════════════════════╪════════════════════════╡
│ 9996.3774220000000000 │ 10004.1899420000000000 │
└───────────────────────┴────────────────────────┘

Query Plans

The first query plan we look at is from a simple SELECT query with a filter on each of the integer columns. This filters on a range of values. Using EXPLAIN (without ANALYZE) will tell us what Postgres thinks it will do. It does not execute the query.

EXPLAIN
SELECT *
    FROM integer_orders
    WHERE ordered BETWEEN 500 AND 5000
;

The resulting plan indicates the index will be used. The estimated cost (0.42..11847.80) does not relate to time, only the magnitude of costs Postgres estimates for that step.

                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Index Scan using ix_integer_orders_ordered on integer_orders  (cost=0.42..11847.80 rows=222071 width=194)
   Index Cond: ((ordered >= 500) AND (ordered <= 5000))
(2 rows)

Run the same query, but now using the not_ordered column in the filter instead of the ordered column.

EXPLAIN
SELECT *
    FROM integer_orders
    WHERE not_ordered BETWEEN 500 AND 5000
;

The estimated cost (3484.21..34968.31) is significantly higher than the same filter on the ordered column. Postgres is estimating this query will take extra steps, starting with a Bitmap Index Scan instead of the Index Scan used previously. It also has Recheck Cond and Bitmap Heap Scan steps estimated as well.

                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on integer_orders  (cost=3484.21..34968.31 rows=225140 width=194)
   Recheck Cond: ((not_ordered >= 500) AND (not_ordered <= 5000))
   ->  Bitmap Index Scan on ix_integer_orders_not_ordered  (cost=0.00..3427.93 rows=225140 width=0)
         Index Cond: ((not_ordered >= 500) AND (not_ordered <= 5000))
(4 rows)

Query Actuals

How do those query plans translate to reality at run time? I am glad you asked!

Each query in this section was executed a minimum of three (3) times with the fastest plan being reported here. Timings mentioned are execution times unless otherwise specified.

These queries are the same queries as above, only the simple EXPLAIN is being replaced with EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE).

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
SELECT *
    FROM integer_orders
    WHERE ordered BETWEEN 500 AND 5000
;

The output now includes actual timings (Execution Time: 57.438 ms) and row counts (225246). The BUFFERS included above shows that the data was read from cache instead of disk (Buffers: shared hit=11064). Using SETTINGS with EXPLAIN is new to Postgres 12. This shows settings influencing the query planner: Settings: jit = 'off', random_page_cost = '1.9'.

                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_integer_orders_ordered on public.integer_orders  (cost=0.42..11847.80 rows=222071 width=194) (actual time=0.026..46.819 rows=225246 loops=1)
   Output: ordered, not_ordered, some_data
   Index Cond: ((integer_orders.ordered >= 500) AND (integer_orders.ordered <= 5000))
   Buffers: shared hit=11064
 Settings: jit = 'off', random_page_cost = '1.9'
 Planning Time: 0.072 ms
 Execution Time: 57.438 ms
(7 rows)

Repeating the same for the not_ordered column.

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
SELECT *
    FROM integer_orders
    WHERE not_ordered BETWEEN 500 AND 5000
;

Results show all data for this query also resides in cache, though it isn't a single cache hit. The total amount reported read from cache is 29343 (619 + 28724), far greater than the 11064 reported above. Last, but not least, the timing is 106 ms vs 57 ms.

                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.integer_orders  (cost=3484.21..34968.31 rows=225140 width=194) (actual time=26.712..95.388 rows=225285 loops=1)
   Output: ordered, not_ordered, some_data
   Recheck Cond: ((integer_orders.not_ordered >= 500) AND (integer_orders.not_ordered <= 5000))
   Heap Blocks: exact=28105
   Buffers: shared hit=28724
   ->  Bitmap Index Scan on ix_integer_orders_not_ordered  (cost=0.00..3427.93 rows=225140 width=0) (actual time=21.649..21.649 rows=225285 loops=1)
         Index Cond: ((integer_orders.not_ordered >= 500) AND (integer_orders.not_ordered <= 5000))
         Buffers: shared hit=619
 Settings: jit = 'off', random_page_cost = '1.9'
 Planning Time: 0.080 ms
 Execution Time: 106.128 ms
(11 rows)

Verdict: Ordered is faster

The filter on a range of ordered numbers was 46% faster than the same filter on unordered numbers.

How to think like the planner?

Look at the statistics! Postgres, like all relational databases, has a plethora of information in its meta tables. The SQL Standard defined information lives in information_schema schema, the Postgres-flavored meta lives in the pg_catalog schema. The pg_catalog.pg_stats view happens to have interesting information on this topic.

"Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk." - PostgreSQL docs

Taking a peek at the correlation.

SELECT attname, n_distinct, correlation
    FROM pg_catalog.pg_stats
    WHERE tablename = 'integer_orders'
;

┌─────────────┬────────────┬────────────────┐
│   attname   │ n_distinct │  correlation   │
╞═════════════╪════════════╪════════════════╡
│ ordered     │      19804 │      0.9999771 │
│ not_ordered │      20024 │   0.0040514097 │
│ some_data   │         -1 │ -0.00016416173 │
└─────────────┴────────────┴────────────────┘

The correlation for the ordered column (attname) is almost 1, while the not_ordered column is almost 0. We saw above how the planner estimated lower costs for the ordered column, this lines up exactly with the expectations set by the Postgres docs.

To find columns in your database that are either low or high in correlation use the following query. Using @ correlation provides the absolute value of the correlation so I don't need to check for positive and negative values.

SELECT tablename, attname, n_distinct,
        CASE WHEN @ correlation < 0.3 THEN 'low'
            WHEN @ correlation > 0.8 THEN 'high'
            ELSE 'medium'
            END AS correlation_desc
    FROM pg_catalog.pg_stats
    WHERE tablename NOT LIKE 'pg%' AND tablename NOT LIKE 'sql%'
;

┌────────────────┬─────────────┬────────────┬──────────────────┐
│   tablename    │   attname   │ n_distinct │ correlation_desc │
╞════════════════╪═════════════╪════════════╪══════════════════╡
│ integer_orders │ ordered     │      19886 │ high             │
│ integer_orders │ not_ordered │      19900 │ low              │
│ integer_orders │ some_data   │         -1 │ low              │
└────────────────┴─────────────┴────────────┴──────────────────┘

See our PgDD extension for more examples of using pg_catalog information!

More targeted filtering

The previous examples used a relatively wide range of numbers for the filter, each query returning about 250k rows. Let's explore what happens when the query is filtered for a single number instead of a range.

The row counts for both queries are both under 50 with a single number instead of a range. The timings are faster for both queries, yet the same trends hold true. The query with the filter on ordered takes a mere 0.035 ms to execute, less than the planning time!

EXPLAIN (ANALYZE)
SELECT *
    FROM integer_orders
    WHERE ordered = 742
;

                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ix_integer_orders_ordered on integer_orders  (cost=0.42..6.10 rows=50 width=194) (actual time=0.010..0.021 rows=48 loops=1)
   Index Cond: (ordered = 742)
 Planning Time: 0.090 ms
 Execution Time: 0.035 ms
(4 rows)

The query with the filter on not_ordered is again slower, taking 0.115 ms.

EXPLAIN (ANALYZE)
SELECT *
    FROM integer_orders
    WHERE not_ordered = 742
;


                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on integer_orders  (cost=2.71..96.44 rows=50 width=194) (actual time=0.029..0.095 rows=45 loops=1)
   Recheck Cond: (not_ordered = 742)
   Heap Blocks: exact=45
   ->  Bitmap Index Scan on ix_integer_orders_not_ordered  (cost=0.00..2.70 rows=50 width=0) (actual time=0.020..0.020 rows=45 loops=1)
         Index Cond: (not_ordered = 742)
 Planning Time: 0.066 ms
 Execution Time: 0.115 ms
(7 rows)

Verdict: Ordered is faster

The filter on a range of ordered numbers was 70% faster than the same filter on unordered numbers.

Use CLUSTER to re-order

If your data is sorted on disk by ordered, but you want to have the data sorted by not_ordered, that can be done with the help of the index. PostgreSQL has the CLUSTER command to physically rewrite the table.

CLUSTER integer_orders USING ix_integer_orders_not_ordered;

Warning! "Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered." - PostgreSQL Docs

If you were to run this CLUSTER command and re-run the queries above, the timings will flip so the not_ordered (but now ordered!) column is the faster of the two for filtering. This operation makes the most sense if the new data coming in will continue to be ordered on the same key. Otherwise a regular maintenance job would need to be configured to occasionally recluster the table, and that has negative side effects best avoided if possible.

Summary

Postgres can find integers faster when the data is in order. These quick tests showed 45-70% improvements in performance by using ordered integers for filtering. This is a good reminder of why I like using auto-incrementing BIGINT values for primary keys!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published August 05, 2020
Last Updated August 05, 2020