Why Partition OpenStreetMap data?
This post covers the first part of my path in considering native Postgres partitioning and how it might be helpful to my work with OpenStreetMap data in PostGIS. Partitioning tables in Postgres can have significant benefits when working with larger data sets, and OpenStreetMap data as a whole is generally considered a large data set. The post following this one will outline the steps I am taking to implement partitioning with data loaded by PgOSM-Flex. A third post is planned to dive into the impacts on performance this change has.
Table partitioning is not an architecture that should be implemented casually without planning and good reason. The consequences of a poorly planned and implemented partitioning scheme can be severe. This is why it is worth the extra time to plan, evaluate and test this option before making any lasting implementation decisions. This post starts by examining the work flow I have used with OpenStreetMap data, challenges with my legacy approach, and highlights where I think Postgres partitioning can provide serious improvement. My next post shows how I am approaching the task of partitioning OpenStreetMap data.
At the time of writing this post I have not decided if this is a path I will continue down for production use. I have not started testing and collecting data for the 3rd post. I will likely make the "Go / No-Go" decision while I am collecting data for the performance related post.
OpenStreetMap for Analytics
The main way I use OpenStreetMap data is within analytics style projects. Routing, travel times, watersheds, urban growth, and land usage are all easily within scope for OpenStreetMap data in PostGIS.
Load and query Pi-hole data from Postgres
I have used Pi-hole on our local network for a
few years now. It is running on a dedicated Raspberry Pi 3B attached
to the router (Netgear Nighthawk) to provide fast local DNS/DCHP while
blocking ads at the network level.
The built-in Pi-hole web interface allows for some basic querying/reporting
of the collected data, but it's a bit limited and quite slow as the data
grows over time. My current pihole-FTL.db
database is 1.4 GB and contains 12 months of data.
$ ls -alh /etc/pihole/pihole-FTL.db
-rw-r--r-- 1 pihole pihole 1.4G Jan 31 14:04 pihole-FTL.db
Pi-hole saves its data in a few SQLite databases with the FTL database (Faster Than Light) being the most interesting. While I could try to work with the data directly in SQLite, I strongly prefer Postgres and decided this was a great time to give the pgspider/sqlite_fdw extension a try. This post goes over the steps I took to bring Pi-hole data into Postgres from its sqlite data source.
See my previous post on using
file_fdw
for more about Postgres' Foreign Data Wrappers.
Better OpenStreetMap places in PostGIS
Data quality is important. This post continues exploring the improved quality of OpenStreetMap data loaded to Postgres/PostGIS via PgOSM-Flex. These improvements are enabled by the new flex output of osm2pgsql, making it easier to understand and consume OpenStreetMap data for analytic purposes.
I started exploring the Flex output a few weeks ago, the post before this one used PgOSM-Flex v0.0.3. This post uses PgOSM-Flex v0.0.7 and highlights a few cool improvements by exploring the OSM place data. Some of the improvements made of the past few weeks were ideas brought over from the legacy PgOSM project. Other improvements were spurred by questions and conversations with the community, such as the nested admin polygons.
Improved places
This post focuses on the osm.place_polygon
data that stores things like
city, county and Country boundaries, along with neighborhoods and other details.
The the format of place data has a number of improvements covered in this post:
- Consolidated name
- Remove duplication between relation/member polygons
- Boundary hierarchy
The data loaded for this post is the U.S. West sub-region from
Geofabrik. It was loaded using the run-all.lua
and run-all.sql
scripts
in
PgOSM-Flex.
Improved OpenStreetMap data structure in PostGIS
It was nearly a decade ago when I first loaded OpenStreetMap data to PostGIS.
Over the years my fingers have typed osm2pgsql --slim --drop ...
countless times
and I do not see an end to that trend anytime soon.
One thing that is changing is that getting high quality OpenStreetMap data into
PostGIS is easier than ever!
This improvement in data quality is made possible by the new Flex output available in osm2pgsql 1.4.0,
I wrote about my initial impressions of the Flex output a few weeks ago.
This post looks at how I am starting to use osm2pgsql's Flex output to provide a
standardized and sanitized OpenStreetMap data set in Postgres/PostGIS.
No longer is osm2pgsql limited to loading data to the 3-table structure
(planet_osm_point
, planet_osm_line
and planet_osm_polygon
)
so I am eagerly converting to the Flex output and taking advantage of these changes!
It is also easier than ever to create customized mix-and-match data loads
for customized needs of specific projects.
Hands on with osm2pgsql's new Flex output
The osm2pgsql project has seen quite a bit of development over the past couple of years. This is a core piece of software used by a large number of people to load OpenStreetMap data into PostGIS / PostgreSQL databases, so it has been great to see the activity and improvements. Recently, I was contacted by Jochen Topf to see if I would give one of those (big!) improvements, osm2pgsql's new Flex output, a try. While the flex output is still marked as "experimental" it is already quite robust. In fact, I have already started thinking of the typical pgsql output I have used for nearly a decade as "the old output!"
So what does this new Flex output do for us? It gives us control over the imported data's format, structure and quality. This process uses Lua styles (scripts) to achieve powerful results. The legacy pgsql output from osm2pgsql gave you three (3) main tables with everything organized into points, lines and polygons, solely by geometry type. From a database design perspective this would be like keeping product prices, employee salaries and expense reports all in one table using the justification "they all deal with money." With the flex output we are no longer constrained by this legacy design. With that in mind, the rest of this post explores osm2pgsql's Flex output.