Round Two: Partitioning OpenStreetMap
A few weeks ago I decided to seriously consider Postgres' declarative table partitioning for our OpenStreetMap data. Once the decision was made to investigate this option, I outlined our use case with requirements to keep multiple versions of OpenStreetMap data over time. That process helped draft my initial plan for how to create and manage the partitioned data. When I put the initial code to the test I found a snag and adjusted the plan.
This post shows a working example of how to partition OpenStreetMap data loaded using PgOSM-Flex.
It works, I love it! I am moving forward with the plan outlined in this post. Some highlights from testing with Colorado sized data:
- Bulk import generates 17% less WAL
- Bulk delete generates 99.8% less WAL
- Simple aggregate query runs 75% faster
First Review of Partitioning OpenStreetMap
My previous two posts set the stage to evaluate declarative Postgres partitioning for OpenStreetMap data. This post outlines what I found when I tested my plan and outlines my next steps. The goal with this series is to determine if partitioning is a path worth going down, or if the additional complexity outweighs any benefits. The first post on partitioning outlined my use case and why I thought partitioning would be a potential benefit. The maintenance aspects of partitioning are my #1 hope for improvement, with easy and fast loading and removal of entire data sets being a big deal for me.
The second post detailed my approach to partitioning to allow me to partition based on date and region. In that post I even bragged that a clever workaround was a suitable solution.
"No big deal, creating the
osmp.pgosm_flex_partitiontable gives each
regiona single ID to use to define list partitions." -- Arrogant Me
Read on to see where that assumption fell apart and my planned next steps.
I was hoping to have made a "Go / No-Go" decision by this point... I am currently at a solid "Probably!"
For testing I simulated Colorado data being loaded once per month on the 1st of each month and North America once per year on January 1. This was conceptually easier to implement and test than trying to capture exactly what I described in my initial post. This approach resulted in 17 snapshots of OpenStreetMap being loaded, 15 with Colorado and two with North America. I loaded this data twice, once using the planned partitioned setup and the other using a simple stacked table to compare performance against.
Partition OpenStreetMap data in PostGIS
This post continues my quest to explore Postgres native
partitioning and determine if it is a good fit for my OpenStreetMap data in PostGIS.
I show how I am planning to implement a partitioning
scheme in a way that a) works well for my use case, and b) is easy
to implement and maintain.
My previous post covered why I think partitioning will be a benefit in our databases.
The following steps are the result of a few iterations of ideas, including asking the osm2pgsql team about one idea I had. Ultimately, I think it is good that osm2pgsql will not support the idea I had asked about there. It forced me to rethink my approach and end up at a better solution. The reality is that partitioning only make sense if the partitioning scheme supports the end use, and end uses are quite varied. Trying to automate partitioning directly in the PgOSM-Flex project would have greatly increased costs of maintaining that project, and likely wasted a ton of time.
The post after this one shows that the plan outlined in this post is not perfect, though it shows great promise. There will be at least one more post to outline how everything all work out!
Prepare for partitioning
I am using OpenStreetMap data loaded by
testing with the
(see more posts on PgOSM-Flex). Eventually I plan
to partition a few tables from the imported data, but to start with I am working
only with the roads.
Why Partition OpenStreetMap data?
This post covers the first part of my path in considering native Postgres partitioning and how it might be helpful to my work with OpenStreetMap data in PostGIS. Partitioning tables in Postgres can have significant benefits when working with larger data sets, and OpenStreetMap data as a whole is generally considered a large data set. The post following this one will outline the steps I am taking to implement partitioning with data loaded by PgOSM-Flex. A third post is planned to dive into the impacts on performance this change has.
Table partitioning is not an architecture that should be implemented casually without planning and good reason. The consequences of a poorly planned and implemented partitioning scheme can be severe. This is why it is worth the extra time to plan, evaluate and test this option before making any lasting implementation decisions. This post starts by examining the work flow I have used with OpenStreetMap data, challenges with my legacy approach, and highlights where I think Postgres partitioning can provide serious improvement. My next post shows how I am approaching the task of partitioning OpenStreetMap data.
At the time of writing this post I have not decided if this is a path I will continue down for production use. I have not started testing and collecting data for the 3rd post. I will likely make the "Go / No-Go" decision while I am collecting data for the performance related post.
OpenStreetMap for Analytics
The main way I use OpenStreetMap data is within analytics style projects. Routing, travel times, watersheds, urban growth, and land usage are all easily within scope for OpenStreetMap data in PostGIS.
Load and query Pi-hole data from Postgres
I have used Pi-hole on our local network for a
few years now. It is running on a dedicated Raspberry Pi 3B attached
to the router (Netgear Nighthawk) to provide fast local DNS/DCHP while
blocking ads at the network level.
The built-in Pi-hole web interface allows for some basic querying/reporting
of the collected data, but it's a bit limited and quite slow as the data
grows over time. My current
pihole-FTL.db database is 1.4 GB and contains 12 months of data.
$ ls -alh /etc/pihole/pihole-FTL.db -rw-r--r-- 1 pihole pihole 1.4G Jan 31 14:04 pihole-FTL.db
Pi-hole saves its data in a few SQLite databases with the FTL database (Faster Than Light) being the most interesting. While I could try to work with the data directly in SQLite, I strongly prefer Postgres and decided this was a great time to give the pgspider/sqlite_fdw extension a try. This post goes over the steps I took to bring Pi-hole data into Postgres from its sqlite data source.
See my previous post on using
file_fdwfor more about Postgres' Foreign Data Wrappers.