RustProof Labs: blogging for education (logo)

Round Two: Partitioning OpenStreetMap

By Ryan Lambert -- Published February 26, 2021

A few weeks ago I decided to seriously consider Postgres' declarative table partitioning for our OpenStreetMap data. Once the decision was made to investigate this option, I outlined our use case with requirements to keep multiple versions of OpenStreetMap data over time. That process helped draft my initial plan for how to create and manage the partitioned data. When I put the initial code to the test I found a snag and adjusted the plan.

This post shows a working example of how to partition OpenStreetMap data loaded using PgOSM-Flex.

TLDR

Spoiler alert!

It works, I love it! I am moving forward with the plan outlined in this post. Some highlights from testing with Colorado sized data:

Setup new partition

At the end of the previous post I had decided to simplify and use only the osm_date for the partition key. The following code creates the new partitioned table, still using the LIST based approach.

CREATE SCHEMA osmp;
COMMENT ON SCHEMA osmp IS 'Objects for partitioned OpenStreetMap data';

CREATE TABLE osmp.road_line (
    osm_id BIGINT NOT NULL,
    osm_type TEXT NOT NULL,
    "name" TEXT NULL,
    "ref" TEXT NULL,
    maxspeed INT NULL,
    oneway SMALLINT NULL,
    layer INT NOT NULL,
    tunnel TEXT NULL,
    bridge TEXT NULL,
    major BOOLEAN NOT NULL,
    route_foot BOOLEAN NULL,
    route_cycle BOOLEAN NULL,
    route_motor BOOLEAN NULL,
    geom GEOMETRY(LINESTRING, 3857) NULL,
    osm_date DATE NOT NULL,
    CONSTRAINT pk_osm_road_line_osm_id_osm_date
        PRIMARY KEY (osm_date, osm_id)
) PARTITION BY LIST (osm_date)
;
COMMENT ON TABLE osmp.road_line IS 'Partitioned OpenStreetMap roads, structure matches PgOSM-Flex v0.1.1.';

The steps to add new partition using the osm_date for the partition key are below. Add the osm_date column, update, and set NOT NULL.

ALTER TABLE osm.road_line ADD osm_date DATE NULL;

UPDATE osm.road_line r
    SET osm_date = f.osm_date
    FROM osm.pgosm_flex f 
    WHERE r.osm_date IS NULL
;

ALTER TABLE osm.road_line
    ALTER COLUMN osm_date
    SET NOT NULL;

Remove the original single column PRIMARY KEY and create a new PRIMARY KEY to include the osm_date partition key.

ALTER TABLE osm.road_line
    DROP CONSTRAINT pk_osm_road_line_osm_id
;
ALTER TABLE osm.road_line
    ADD CONSTRAINT pk_osm_road_line_osm_date_osm_id
    PRIMARY KEY (osm_date, osm_id)
;

Earlier when the osm_date column was added and populated it resulted in rows being rewritten, thus bloating existing indexes on the osm.road_line table. Reindex the table.

REINDEX TABLE osm.road_line;

The previous post showed a query to look at index sizes. Before making any changes to the road_line table, the GIST index road_line_geom_idx took 44 MB on disk.

┌────────┬───────────┬──────────────────────────┬────────────┬──────────────────┐
│ s_name │  t_name   │        index_name        │ index_size │ index_size_bytes │
╞════════╪═══════════╪══════════════════════════╪════════════╪══════════════════╡
│ osm    │ road_line │ road_line_geom_idx       │ 44 MB      │         45875200 │
└────────┴───────────┴──────────────────────────┴────────────┴──────────────────┘

After creating and populating the osm_date column the index has more than doubled in size as seen in the following table. The above REINDEX TABLE command will resolve this index bloat.

┌────────┬───────────┬────────────────────┬────────────┬──────────────────┐
│ s_name │  t_name   │     index_name     │ index_size │ index_size_bytes │
╞════════╪═══════════╪════════════════════╪════════════╪══════════════════╡
│ osm    │ road_line │ road_line_geom_idx │ 91 MB      │         95731712 │
└────────┴───────────┴────────────────────┴────────────┴──────────────────┘

Create the CHECK CONSTRAINT before attaching the partition to avoid a SHARE UPDATE EXCLUSIVE lock on the parent table. Delete the CHECK after attaching because it is redundant.

ALTER TABLE osm.road_line
    ADD CONSTRAINT ck_osm_co_road_line_partition
    CHECK ( osm_date = '2021-02-24' )
;

ALTER TABLE osmp.road_line
    ATTACH PARTITION osm.road_line
    FOR VALUES IN ( '2021-02-24' )
;

ALTER TABLE osm.road_line
    DROP CONSTRAINT ck_osm_co_road_line_partition;

With the new table attached, that completes the standard steps that benefit from the osm schema name. Rename the schema to make room for future data loads.

ALTER SCHEMA osm RENAME TO osm_20200224;

Verify Query performance

While query performance is not the reason I decided to give partitioning a try, it is important to ensure partitioning does not severely degrade performance. Luckily, performance against the partitioned tables are generally as fast, if not faster, than using the "stacked" method. The plans from one query counting major roads shows the partitioned query runs 75% faster than the same query on the stacked table.

The partitioned query uses the index on the major column while the stacked query chooses not to use it.

Data loading

My initial testing shows that the preparation for partitioned OpenStreetMap data takes 20% longer on the wall-clock. The reality is, that wall clock time does not matter that much. Preliminary testing on larger regions shows that the pattern seems to hold as data sizes increases.

WAL generated during import

A more important way to look at the impact of this data ingestion is to examine how much WAL is generated. One way to check this is to make a note of the current WAL location before the load and check it again after the load. The pg_current_wal_lsn() function can be used to track this. For easy reuse I'll save the results of pg_current_wal_lsn() to a table.

DROP TABLE IF EXISTS wal_check;
CREATE UNLOGGED TABLE wal_check AS
SELECT pg_current_wal_lsn() AS wal_lsn
;

Query the table to see the recorded position, it does not mean much to us humans.

SELECT * FROM wal_check;

┌─────────────┐
│   wal_lsn   │
╞═════════════╡
│ 12/4C4D7F60 │
└─────────────┘

After saving the position of the WAL, I loaded a new Colorado partition. The following query reports how much WAL has been generated since the above query was ran, this shows 623 MB of WAL was generated.

SELECT pg_size_pretty(pg_current_wal_lsn() - w.wal_lsn) AS wal_size
    FROM wal_check w
;

┌──────────┐
│ wal_size │
╞══════════╡
│ 623 MB   │
└──────────┘

Next I dropped and recreated the table tracking WAL position and loaded the same batch of Colorado data to the stacked tabled. Loading the stacked table generated 751 MB of WAL.

The partitioned table generates 17% less WAL than loading the same data into the stacked table.

Deleting data

While I want to save snapshots of data over time, I also have no intention of keeping every version I load. This means I need to delete data in bulk!

Deleting data generates WAL, often in surprising amounts. This adds to the load on the disk (it has to write the WAL) and network (WAL shipping, streaming replication). The test database I have used here is small but even here with Colorado-sized OSM data, tracking the WAL highlights the benefit to partitioning.

DELETE FROM osms.road_line WHERE osm_date = '2021-02-26';
DELETE 814705
Time: 1581.329 ms (00:01.581)

This DELETE statement generates 300 MB of WAL.

┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 300 MB         │
└────────────────┘

By comparison, the equivalent commands to remove a partition.

ALTER TABLE osmp.road_line DETACH PARTITION osm_20210226.road_line;
Time: 60.112 ms

DROP TABLE osm_20210224.road_line;
Time: 67.216 ms

How much WAL does that generate? Less than 1 MB! That is 99.8% less WAL generated by removing the partition instead of deleting from the stacked table. This is the type of benefit I had been looking for.

┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 674 kB         │
└────────────────┘

Note: Creating a table to track the WAL inadvertently generates WAL in the process! 18 kB of the 674 kB reported above was WAL from creating the table to track the WAL usage.

Dump and Restore in Parallel

Another area of maintenance that can benefit from table partitioning are backup and restore operations. Generally in Postgres, a table is associated with a file. The larger the table, the larger the file. Having one big table in one big file results in backup/restore operations getting stuck using a single CPU. Testing this with pg_dump and pg_restore using directory mode (-Fd) allows using multiple processes (-j3). Example commands for the dump/restore might look like the following.

pg_dump -Fd -f /path/to/backup -j3 -d db_name

pg_restore -Fd -j3 -d db_name /path/to/backup

The database with the partitioned table was 57% faster with pg_dump and 33% faster with pg_restore. I have not tested this with pgbackrest yet but assume similar benefits will be observed there as well.

Summary

Postgres' table partitioning is going to be a great benefit to our long-term OpenStreetMap data management! My main goal for improvement was related to bulk loading and removing data, that has panned out as hoped. While query performance was not a primary goal, it is always exciting to see queries running 75% faster. My hope for query performance was that I would not see a decline in query performance like I found in the previous post.

It does deserve one last word of caution: proper attention must be given to the partitioning scheme to ensure the implementation meets your end use goals. A partitioning scheme that does not align with your query or data ingestion pattern will likely cause frustration. Plan, test, adjust, re-test, etc., until you find what works for you!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published February 26, 2021
Last Updated February 26, 2021