Round Two: Partitioning OpenStreetMap
A few weeks ago I decided to seriously consider Postgres' declarative table partitioning for our OpenStreetMap data. Once the decision was made to investigate this option, I outlined our use case with requirements to keep multiple versions of OpenStreetMap data over time. That process helped draft my initial plan for how to create and manage the partitioned data. When I put the initial code to the test I found a snag and adjusted the plan.
This post shows a working example of how to partition OpenStreetMap data loaded using PgOSM-Flex.
TLDR
Spoiler alert!
It works, I love it! I am moving forward with the plan outlined in this post. Some highlights from testing with Colorado sized data:
- Bulk import generates 17% less WAL
- Bulk delete generates 99.8% less WAL
- Simple aggregate query runs 75% faster
Setup new partition
At the end of the previous post
I had decided to simplify and use only the osm_date
for the partition key. The following code creates the new partitioned table, still
using the
LIST
based approach.
CREATE SCHEMA osmp;
COMMENT ON SCHEMA osmp IS 'Objects for partitioned OpenStreetMap data';
CREATE TABLE osmp.road_line (
osm_id BIGINT NOT NULL,
osm_type TEXT NOT NULL,
"name" TEXT NULL,
"ref" TEXT NULL,
maxspeed INT NULL,
oneway SMALLINT NULL,
layer INT NOT NULL,
tunnel TEXT NULL,
bridge TEXT NULL,
major BOOLEAN NOT NULL,
route_foot BOOLEAN NULL,
route_cycle BOOLEAN NULL,
route_motor BOOLEAN NULL,
geom GEOMETRY(LINESTRING, 3857) NULL,
osm_date DATE NOT NULL,
CONSTRAINT pk_osm_road_line_osm_id_osm_date
PRIMARY KEY (osm_date, osm_id)
) PARTITION BY LIST (osm_date)
;
COMMENT ON TABLE osmp.road_line IS 'Partitioned OpenStreetMap roads, structure matches PgOSM-Flex v0.1.1.';
The steps to add new partition using the osm_date
for the partition key are below.
Add the osm_date
column, update, and set NOT NULL
.
ALTER TABLE osm.road_line ADD osm_date DATE NULL;
UPDATE osm.road_line r
SET osm_date = f.osm_date
FROM osm.pgosm_flex f
WHERE r.osm_date IS NULL
;
ALTER TABLE osm.road_line
ALTER COLUMN osm_date
SET NOT NULL;
Remove the original single column PRIMARY KEY
and create a new PRIMARY KEY
to
include the osm_date
partition key.
ALTER TABLE osm.road_line
DROP CONSTRAINT pk_osm_road_line_osm_id
;
ALTER TABLE osm.road_line
ADD CONSTRAINT pk_osm_road_line_osm_date_osm_id
PRIMARY KEY (osm_date, osm_id)
;
Earlier when the osm_date
column was added and populated it resulted in rows
being rewritten, thus bloating existing indexes on the osm.road_line
table. Reindex the
table.
REINDEX TABLE osm.road_line;
The previous post showed a query to look at index sizes. Before making any changes to the
road_line
table, the GIST indexroad_line_geom_idx
took 44 MB on disk.┌────────┬───────────┬──────────────────────────┬────────────┬──────────────────┐ │ s_name │ t_name │ index_name │ index_size │ index_size_bytes │ ╞════════╪═══════════╪══════════════════════════╪════════════╪══════════════════╡ │ osm │ road_line │ road_line_geom_idx │ 44 MB │ 45875200 │ └────────┴───────────┴──────────────────────────┴────────────┴──────────────────┘
After creating and populating the
osm_date
column the index has more than doubled in size as seen in the following table. The aboveREINDEX TABLE
command will resolve this index bloat.┌────────┬───────────┬────────────────────┬────────────┬──────────────────┐ │ s_name │ t_name │ index_name │ index_size │ index_size_bytes │ ╞════════╪═══════════╪════════════════════╪════════════╪══════════════════╡ │ osm │ road_line │ road_line_geom_idx │ 91 MB │ 95731712 │ └────────┴───────────┴────────────────────┴────────────┴──────────────────┘
Create the CHECK CONSTRAINT
before attaching the partition to avoid
a SHARE UPDATE EXCLUSIVE
lock on the parent table. Delete the CHECK
after attaching because it is redundant.
ALTER TABLE osm.road_line
ADD CONSTRAINT ck_osm_co_road_line_partition
CHECK ( osm_date = '2021-02-24' )
;
ALTER TABLE osmp.road_line
ATTACH PARTITION osm.road_line
FOR VALUES IN ( '2021-02-24' )
;
ALTER TABLE osm.road_line
DROP CONSTRAINT ck_osm_co_road_line_partition;
With the new table attached, that completes the standard steps that benefit from
the osm
schema name. Rename the schema to make room for future data loads.
ALTER SCHEMA osm RENAME TO osm_20200224;
Verify Query performance
While query performance is not the reason I decided to give partitioning a try, it is important to ensure partitioning does not severely degrade performance. Luckily, performance against the partitioned tables are generally as fast, if not faster, than using the "stacked" method. The plans from one query counting major roads shows the partitioned query runs 75% faster than the same query on the stacked table.
The partitioned query uses the index on the major
column while the stacked query
chooses not to use it.
Data loading
My initial testing shows that the preparation for partitioned OpenStreetMap data takes 20% longer on the wall-clock. The reality is, that wall clock time does not matter that much. Preliminary testing on larger regions shows that the pattern seems to hold as data sizes increases.
WAL generated during import
A more important way to look at the impact of this data ingestion is to
examine how much WAL
is generated.
One way to check this is to make a note of the current WAL location before the load
and check it again after the load. The
pg_current_wal_lsn()
function can be used to track this.
For easy reuse I'll save the results of pg_current_wal_lsn()
to a table.
DROP TABLE IF EXISTS wal_check;
CREATE UNLOGGED TABLE wal_check AS
SELECT pg_current_wal_lsn() AS wal_lsn
;
Query the table to see the recorded position, it does not mean much to us humans.
SELECT * FROM wal_check;
┌─────────────┐
│ wal_lsn │
╞═════════════╡
│ 12/4C4D7F60 │
└─────────────┘
After saving the position of the WAL, I loaded a new Colorado partition. The following query reports how much WAL has been generated since the above query was ran, this shows 623 MB of WAL was generated.
SELECT pg_size_pretty(pg_current_wal_lsn() - w.wal_lsn) AS wal_size
FROM wal_check w
;
┌──────────┐
│ wal_size │
╞══════════╡
│ 623 MB │
└──────────┘
Next I dropped and recreated the table tracking WAL position and loaded the same batch of Colorado data to the stacked tabled. Loading the stacked table generated 751 MB of WAL.
The partitioned table generates 17% less WAL than loading the same data into the stacked table.
Deleting data
While I want to save snapshots of data over time, I also have no intention of keeping every version I load. This means I need to delete data in bulk!
Deleting data generates WAL, often in surprising amounts. This adds to the load on the disk (it has to write the WAL) and network (WAL shipping, streaming replication). The test database I have used here is small but even here with Colorado-sized OSM data, tracking the WAL highlights the benefit to partitioning.
DELETE FROM osms.road_line WHERE osm_date = '2021-02-26';
DELETE 814705
Time: 1581.329 ms (00:01.581)
This DELETE
statement generates 300 MB of WAL.
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 300 MB │
└────────────────┘
By comparison, the equivalent commands to remove a partition.
ALTER TABLE osmp.road_line DETACH PARTITION osm_20210226.road_line;
Time: 60.112 ms
DROP TABLE osm_20210224.road_line;
Time: 67.216 ms
How much WAL does that generate? Less than 1 MB! That is 99.8% less WAL generated by removing the partition instead of deleting from the stacked table. This is the type of benefit I had been looking for.
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 674 kB │
└────────────────┘
Note: Creating a table to track the WAL inadvertently generates WAL in the process! 18 kB of the 674 kB reported above was WAL from creating the table to track the WAL usage.
Dump and Restore in Parallel
Another area of maintenance that can benefit from table partitioning are backup
and restore operations. Generally in Postgres, a table is associated with a file.
The larger the table, the larger the file. Having one big table in one big file
results in backup/restore operations getting stuck using a single CPU.
Testing this with pg_dump
and pg_restore
using directory mode (-Fd
) allows
using multiple processes (-j3
). Example commands for the dump/restore might
look like the following.
pg_dump -Fd -f /path/to/backup -j3 -d db_name
pg_restore -Fd -j3 -d db_name /path/to/backup
The database with the partitioned table was 57% faster with pg_dump
and
33% faster with pg_restore
. I have not tested this with
pgbackrest yet but assume similar benefits will be
observed there as well.
Summary
Postgres' table partitioning is going to be a great benefit to our long-term OpenStreetMap data management! My main goal for improvement was related to bulk loading and removing data, that has panned out as hoped. While query performance was not a primary goal, it is always exciting to see queries running 75% faster. My hope for query performance was that I would not see a decline in query performance like I found in the previous post.
It does deserve one last word of caution: proper attention must be given to the partitioning scheme to ensure the implementation meets your end use goals. A partitioning scheme that does not align with your query or data ingestion pattern will likely cause frustration. Plan, test, adjust, re-test, etc., until you find what works for you!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!