RustProof Labs: blogging for education (logo)

PostgreSQL 12 and PostGIS 3 Initial Review

By Ryan Lambert -- Published November 13, 2019

Postgres 12 and PostGIS 3 have been out and production ready for about a month now. I have been testing the upgrade from Pg 11 to 12 and have evaluated various workloads, mainly looking for red flags that would hold up an upgrade. At this point I am happy with what I see and am planning the upgrade for all our Postgres databases! This post covers a few points I found interesting during my initial testing. See the Postgres 12 release notes and PostGIS 3 release notes for more details about what has changed in these versions.

Reasons to upgrade

The main feature making me want to upgrade to Postgres 12 is stored generated columns. This allows columns to store a calculated value to disk, much like materialized views store the results of queries. Not all our Pg instances have plans to use the feature at this time, so I could hold some back on Pg11 if I wanted. I decided we will upgrade all instances anyway, largely because with the changes to streaming replication it will be easier and safer to not have to keep two sets of procedures.

The other main feature I have had an eye on is the ability to create covering GIST indexes. The idea here is that a GIST index can be used to find appropriate records from the database, and the INCLUDE allows adding other columns to be included in the index and avoid pulling them from disk.

Generated columns

My first use for generated columns in Postgres 12 is to combine components of physical addresses into a single field. This is helpful with data I am geocoding with PostGIS' Tiger geocoder. The data typically comes in with the address split out into its individual components (street, city, state, zip), while the address normalizer wants a single string. The following example will add a generated column named address_full to automatically combine the fields from their sub-components.

ALTER TABLE co_businesses
    ADD address_full TEXT
    GENERATED ALWAYS AS (address1 || ', ' || city || ', ' || state_code || ' ' || zip)
    STORED;

Note: This assumes each individual field is set to NOT NULL. If values can contain NULL you should use COALESCE() or the entire string will be empty.

Covering GIST Indexes

The basic syntax to create a covering index:

CREATE INDEX GIX_building_covering_with_name
    ON osm.building_polygon USING GIST (way)
    INCLUDE (name)
;

I did some quick testing on existing data and queries but was unable to find a query that took advantage of this. I need to put in more time into testing the nuances of this feature before I give up on it.

Performance

My initial impression of Postgres 12's performance is it is roughly consistent with Pg11 on the same hardware. Some parts are a bit faster (creating GIST indexes is about 6% faster), other parts are a bit slower (bulk loading data). Kaarel Moppel found a slight decline in performance and attributes much of that to the overhaul of the storage layer.

"... the average performance actually decreased slightly! Well, it’s only a modest 3.3% for the mean times .... And it can of course be totally normal and expected, since the storage architecture was overhauled ..."

Query to examine

I settled on one query to examine EXPLAIN (ANALYZE) output that provides a few interesting points.

The basic query is to find points (trees) within a buffer around a building. In this case we filter for a specific building and find trees within 1km (1000m) of the building. The underlying tables are loaded with OpenStreetMap data for Colorado (read how to load and transform). The way columns in each table have appropriate GIST indexes but no index is on the building name column, or the natural column from the point table.

SELECT t.natural, t.way
    FROM osm.building_polygon b
    INNER JOIN osm.natural_point t
        ON t.natural = 'tree' 
            AND ST_Contains(ST_Buffer(b.way, 1000), t.way)
    WHERE b.name = 'Pangea Coffee Roasters'
;

The query was ran on two server configurations, a Digital Ocean droplet with 4 CPU and 8 GB RAM, and a local VM with 4 CPU and 4 GB RAM. The VM and the DO servers were loaded with OSM data loaded on different dates, so row counts do not match identically. The output from the four (4) EXPLAIN (ANALYZE) outputs was loaded into https://explain.depesz.com linked below.

EXPLAIN observation: Updated output

Output of EXPLAIN with GIST indexes now uses @ instead of ~. Also, Pg12 reorders the condition itself, I'm not sure if that's purely cosmetic or an indicator that it is reordering for efficiency.

Postgres 11 / PostGIS 2.5 index condition:

Index Cond: (st_buffer(b.way, '1000'::double precision) ~ way)

Postgres 12 / PostGIS 3 index condition:

Index Cond: (way @ st_buffer(b.way, '1000'::double precision, ''::text))

EXPLAIN observation: Goodbye, recheck

Postgres 11 and PostGIS 2.5 (and earlier) have always had a Recheck condition listed when filtering on a GIST index. With Postgres 12 and PostGIS 3 (both minimum required) the need to recheck disappears. An example of this output:

Recheck Cond: (st_buffer(b.way, '1000'::double precision) ~ way)

It appears this change is a result of this commit, thanks to Jeff Janes for helping find that.

Key takeaways

Summary

Postgres 12 and PostGIS 3 are both great steps forward! This post focused on a few key details and early observations. There will be much more to write about after our production upgrades are completed!

My first goal with testing was to confirm there wasn't a major performance regression. Second, I needed to ensure the features I'm upgrading for will work the way I intend them to. All in all, success!

Need help with your PostgreSQL servers or databases?

Contact us to start the conversation!

By Ryan Lambert
Published November 13, 2019
Last Updated November 13, 2019