Round Two: Partitioning OpenStreetMap
A few weeks ago I decided to seriously consider Postgres' declarative table partitioning for our OpenStreetMap data. Once the decision was made to investigate this option, I outlined our use case with requirements to keep multiple versions of OpenStreetMap data over time. That process helped draft my initial plan for how to create and manage the partitioned data. When I put the initial code to the test I found a snag and adjusted the plan.
This post shows a working example of how to partition OpenStreetMap data loaded using PgOSM-Flex.
It works, I love it! I am moving forward with the plan outlined in this post. Some highlights from testing with Colorado sized data:
- Bulk import generates 17% less WAL
- Bulk delete generates 99.8% less WAL
- Simple aggregate query runs 75% faster
Setup new partition
At the end of the previous post
I had decided to simplify and use only the
for the partition key. The following code creates the new partitioned table, still
LIST based approach.
CREATE SCHEMA osmp; COMMENT ON SCHEMA osmp IS 'Objects for partitioned OpenStreetMap data'; CREATE TABLE osmp.road_line ( osm_id BIGINT NOT NULL, osm_type TEXT NOT NULL, "name" TEXT NULL, "ref" TEXT NULL, maxspeed INT NULL, oneway SMALLINT NULL, layer INT NOT NULL, tunnel TEXT NULL, bridge TEXT NULL, major BOOLEAN NOT NULL, route_foot BOOLEAN NULL, route_cycle BOOLEAN NULL, route_motor BOOLEAN NULL, geom GEOMETRY(LINESTRING, 3857) NULL, osm_date DATE NOT NULL, CONSTRAINT pk_osm_road_line_osm_id_osm_date PRIMARY KEY (osm_date, osm_id) ) PARTITION BY LIST (osm_date) ; COMMENT ON TABLE osmp.road_line IS 'Partitioned OpenStreetMap roads, structure matches PgOSM-Flex v0.1.1.';
The steps to add new partition using the
osm_date for the partition key are below.
osm_date column, update, and set
ALTER TABLE osm.road_line ADD osm_date DATE NULL; UPDATE osm.road_line r SET osm_date = f.osm_date FROM osm.pgosm_flex f WHERE r.osm_date IS NULL ; ALTER TABLE osm.road_line ALTER COLUMN osm_date SET NOT NULL;
Remove the original single column
PRIMARY KEY and create a new
PRIMARY KEY to
osm_date partition key.
ALTER TABLE osm.road_line DROP CONSTRAINT pk_osm_road_line_osm_id ; ALTER TABLE osm.road_line ADD CONSTRAINT pk_osm_road_line_osm_date_osm_id PRIMARY KEY (osm_date, osm_id) ;
Earlier when the
osm_date column was added and populated it resulted in rows
being rewritten, thus bloating existing indexes on the
osm.road_line table. Reindex the
REINDEX TABLE osm.road_line;
The previous post showed a query to look at index sizes. Before making any changes to the
road_linetable, the GIST index
road_line_geom_idxtook 44 MB on disk.
┌────────┬───────────┬──────────────────────────┬────────────┬──────────────────┐ │ s_name │ t_name │ index_name │ index_size │ index_size_bytes │ ╞════════╪═══════════╪══════════════════════════╪════════════╪══════════════════╡ │ osm │ road_line │ road_line_geom_idx │ 44 MB │ 45875200 │ └────────┴───────────┴──────────────────────────┴────────────┴──────────────────┘
After creating and populating the
osm_datecolumn the index has more than doubled in size as seen in the following table. The above
REINDEX TABLEcommand will resolve this index bloat.
┌────────┬───────────┬────────────────────┬────────────┬──────────────────┐ │ s_name │ t_name │ index_name │ index_size │ index_size_bytes │ ╞════════╪═══════════╪════════════════════╪════════════╪══════════════════╡ │ osm │ road_line │ road_line_geom_idx │ 91 MB │ 95731712 │ └────────┴───────────┴────────────────────┴────────────┴──────────────────┘
CHECK CONSTRAINT before attaching the partition to avoid
SHARE UPDATE EXCLUSIVE lock on the parent table. Delete the
after attaching because it is redundant.
ALTER TABLE osm.road_line ADD CONSTRAINT ck_osm_co_road_line_partition CHECK ( osm_date = '2021-02-24' ) ; ALTER TABLE osmp.road_line ATTACH PARTITION osm.road_line FOR VALUES IN ( '2021-02-24' ) ; ALTER TABLE osm.road_line DROP CONSTRAINT ck_osm_co_road_line_partition;
With the new table attached, that completes the standard steps that benefit from
osm schema name. Rename the schema to make room for future data loads.
ALTER SCHEMA osm RENAME TO osm_20200224;
Verify Query performance
While query performance is not the reason I decided to give partitioning a try, it is important to ensure partitioning does not severely degrade performance. Luckily, performance against the partitioned tables are generally as fast, if not faster, than using the "stacked" method. The plans from one query counting major roads shows the partitioned query runs 75% faster than the same query on the stacked table.
The partitioned query uses the index on the
major column while the stacked query
chooses not to use it.
My initial testing shows that the preparation for partitioned OpenStreetMap data takes 20% longer on the wall-clock. The reality is, that wall clock time does not matter that much. Preliminary testing on larger regions shows that the pattern seems to hold as data sizes increases.
WAL generated during import
A more important way to look at the impact of this data ingestion is to
examine how much WAL
One way to check this is to make a note of the current WAL location before the load
and check it again after the load. The
pg_current_wal_lsn() function can be used to track this.
For easy reuse I'll save the results of
pg_current_wal_lsn() to a table.
DROP TABLE IF EXISTS wal_check; CREATE UNLOGGED TABLE wal_check AS SELECT pg_current_wal_lsn() AS wal_lsn ;
Query the table to see the recorded position, it does not mean much to us humans.
SELECT * FROM wal_check; ┌─────────────┐ │ wal_lsn │ ╞═════════════╡ │ 12/4C4D7F60 │ └─────────────┘
After saving the position of the WAL, I loaded a new Colorado partition. The following query reports how much WAL has been generated since the above query was ran, this shows 623 MB of WAL was generated.
SELECT pg_size_pretty(pg_current_wal_lsn() - w.wal_lsn) AS wal_size FROM wal_check w ; ┌──────────┐ │ wal_size │ ╞══════════╡ │ 623 MB │ └──────────┘
Next I dropped and recreated the table tracking WAL position and loaded the same batch of Colorado data to the stacked tabled. Loading the stacked table generated 751 MB of WAL.
The partitioned table generates 17% less WAL than loading the same data into the stacked table.
While I want to save snapshots of data over time, I also have no intention of keeping every version I load. This means I need to delete data in bulk!
Deleting data generates WAL, often in surprising amounts. This adds to the load on the disk (it has to write the WAL) and network (WAL shipping, streaming replication). The test database I have used here is small but even here with Colorado-sized OSM data, tracking the WAL highlights the benefit to partitioning.
DELETE FROM osms.road_line WHERE osm_date = '2021-02-26'; DELETE 814705 Time: 1581.329 ms (00:01.581)
DELETE statement generates 300 MB of WAL.
┌────────────────┐ │ pg_size_pretty │ ╞════════════════╡ │ 300 MB │ └────────────────┘
By comparison, the equivalent commands to remove a partition.
ALTER TABLE osmp.road_line DETACH PARTITION osm_20210226.road_line; Time: 60.112 ms DROP TABLE osm_20210224.road_line; Time: 67.216 ms
How much WAL does that generate? Less than 1 MB! That is 99.8% less WAL generated by removing the partition instead of deleting from the stacked table. This is the type of benefit I had been looking for.
┌────────────────┐ │ pg_size_pretty │ ╞════════════════╡ │ 674 kB │ └────────────────┘
Note: Creating a table to track the WAL inadvertently generates WAL in the process! 18 kB of the 674 kB reported above was WAL from creating the table to track the WAL usage.
Dump and Restore in Parallel
Another area of maintenance that can benefit from table partitioning are backup
and restore operations. Generally in Postgres, a table is associated with a file.
The larger the table, the larger the file. Having one big table in one big file
results in backup/restore operations getting stuck using a single CPU.
Testing this with
pg_restore using directory mode (
using multiple processes (
-j3). Example commands for the dump/restore might
look like the following.
pg_dump -Fd -f /path/to/backup -j3 -d db_name pg_restore -Fd -j3 -d db_name /path/to/backup
The database with the partitioned table was 57% faster with
33% faster with
pg_restore. I have not tested this with
pgbackrest yet but assume similar benefits will be
observed there as well.
Postgres' table partitioning is going to be a great benefit to our long-term OpenStreetMap data management! My main goal for improvement was related to bulk loading and removing data, that has panned out as hoped. While query performance was not a primary goal, it is always exciting to see queries running 75% faster. My hope for query performance was that I would not see a decline in query performance like I found in the previous post.
It does deserve one last word of caution: proper attention must be given to the partitioning scheme to ensure the implementation meets your end use goals. A partitioning scheme that does not align with your query or data ingestion pattern will likely cause frustration. Plan, test, adjust, re-test, etc., until you find what works for you!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!
Published February 26, 2021
Last Updated February 26, 2021