Partition OpenStreetMap data in PostGIS
This post continues my quest to explore Postgres native
partitioning and determine if it is a good fit for my OpenStreetMap data in PostGIS.
I show how I am planning to implement a partitioning
scheme in a way that a) works well for my use case, and b) is easy
to implement and maintain.
My previous post covered why
I think partitioning will be a benefit
in our databases.
The following steps are the result of a few iterations of ideas, including asking the osm2pgsql team about one idea I had. Ultimately, I think it is good that osm2pgsql will not support the idea I had asked about there. It forced me to rethink my approach and end up at a better solution. The reality is that partitioning only make sense if the partitioning scheme supports the end use, and end uses are quite varied. Trying to automate partitioning directly in the PgOSM-Flex project would have greatly increased costs of maintaining that project, and likely wasted a ton of time.
The post after this one shows that the plan outlined in this post is not perfect, though it shows great promise. There will be at least one more post to outline how everything all work out!
Prepare for partitioning
I am using OpenStreetMap data loaded by
PgOSM-Flex,
testing with the osm.road_line
table
(see more posts on PgOSM-Flex). Eventually I plan
to partition a few tables from the imported data, but to start with I am working
only with the roads.
Why Partition OpenStreetMap data?
This post covers the first part of my path in considering native Postgres partitioning and how it might be helpful to my work with OpenStreetMap data in PostGIS. Partitioning tables in Postgres can have significant benefits when working with larger data sets, and OpenStreetMap data as a whole is generally considered a large data set. The post following this one will outline the steps I am taking to implement partitioning with data loaded by PgOSM-Flex. A third post is planned to dive into the impacts on performance this change has.
Table partitioning is not an architecture that should be implemented casually without planning and good reason. The consequences of a poorly planned and implemented partitioning scheme can be severe. This is why it is worth the extra time to plan, evaluate and test this option before making any lasting implementation decisions. This post starts by examining the work flow I have used with OpenStreetMap data, challenges with my legacy approach, and highlights where I think Postgres partitioning can provide serious improvement. My next post shows how I am approaching the task of partitioning OpenStreetMap data.
At the time of writing this post I have not decided if this is a path I will continue down for production use. I have not started testing and collecting data for the 3rd post. I will likely make the "Go / No-Go" decision while I am collecting data for the performance related post.
OpenStreetMap for Analytics
The main way I use OpenStreetMap data is within analytics style projects. Routing, travel times, watersheds, urban growth, and land usage are all easily within scope for OpenStreetMap data in PostGIS.
Load and query Pi-hole data from Postgres
I have used Pi-hole on our local network for a
few years now. It is running on a dedicated Raspberry Pi 3B attached
to the router (Netgear Nighthawk) to provide fast local DNS/DCHP while
blocking ads at the network level.
The built-in Pi-hole web interface allows for some basic querying/reporting
of the collected data, but it's a bit limited and quite slow as the data
grows over time. My current pihole-FTL.db
database is 1.4 GB and contains 12 months of data.
$ ls -alh /etc/pihole/pihole-FTL.db
-rw-r--r-- 1 pihole pihole 1.4G Jan 31 14:04 pihole-FTL.db
Pi-hole saves its data in a few SQLite databases with the FTL database (Faster Than Light) being the most interesting. While I could try to work with the data directly in SQLite, I strongly prefer Postgres and decided this was a great time to give the pgspider/sqlite_fdw extension a try. This post goes over the steps I took to bring Pi-hole data into Postgres from its sqlite data source.
See my previous post on using
file_fdw
for more about Postgres' Foreign Data Wrappers.
Better OpenStreetMap places in PostGIS
Data quality is important. This post continues exploring the improved quality of OpenStreetMap data loaded to Postgres/PostGIS via PgOSM-Flex. These improvements are enabled by the new flex output of osm2pgsql, making it easier to understand and consume OpenStreetMap data for analytic purposes.
I started exploring the Flex output a few weeks ago, the post before this one used PgOSM-Flex v0.0.3. This post uses PgOSM-Flex v0.0.7 and highlights a few cool improvements by exploring the OSM place data. Some of the improvements made of the past few weeks were ideas brought over from the legacy PgOSM project. Other improvements were spurred by questions and conversations with the community, such as the nested admin polygons.
Improved places
This post focuses on the osm.place_polygon
data that stores things like
city, county and Country boundaries, along with neighborhoods and other details.
The the format of place data has a number of improvements covered in this post:
- Consolidated name
- Remove duplication between relation/member polygons
- Boundary hierarchy
The data loaded for this post is the U.S. West sub-region from
Geofabrik. It was loaded using the run-all.lua
and run-all.sql
scripts
in
PgOSM-Flex.
Improved OpenStreetMap data structure in PostGIS
It was nearly a decade ago when I first loaded OpenStreetMap data to PostGIS.
Over the years my fingers have typed osm2pgsql --slim --drop ...
countless times
and I do not see an end to that trend anytime soon.
One thing that is changing is that getting high quality OpenStreetMap data into
PostGIS is easier than ever!
This improvement in data quality is made possible by the new Flex output available in osm2pgsql 1.4.0,
I wrote about my initial impressions of the Flex output a few weeks ago.
This post looks at how I am starting to use osm2pgsql's Flex output to provide a
standardized and sanitized OpenStreetMap data set in Postgres/PostGIS.
No longer is osm2pgsql limited to loading data to the 3-table structure
(planet_osm_point
, planet_osm_line
and planet_osm_polygon
)
so I am eagerly converting to the Flex output and taking advantage of these changes!
It is also easier than ever to create customized mix-and-match data loads
for customized needs of specific projects.