PgOSM transformations explained
I wrote about the PgOSM project last month
showing how to transform OpenStreetMap data into a more relational format.
The goal of this post is to show how you can easily customize the
transformations and reductions to suit your needs.
The prior post used a command to load a
(view on GitHub)
under the section Deploy schema and define layers.
psql -d pgosm -f ~/git/pgosm/db/data/layer_definitions.sql
Customizing the data loaded in that step is the key to changing how PgOSM transforms, reduces and organizes your OpenStreetMap data. This post shows you how to do that!
History of PgOSM
I came up with the core of this project in 2015 over the course of a weekend or two. Since then it has continued to serve my needs reliably, mostly without updates, ready to load fresh OpenStreetMap data any time I asked. Recently I decided to clean up the project and release it under the MIT license. I've always felt like it's an ugly solution, it's far from perfect... but being reliable and customizable counts for a lot!
When I started this project I was already sourcing my OpenStreetMap data from Geofabrik, and they're kind enough to publish a detailed specification classifying layers. The codes used in the default PgOSM transformation were derived from an older version of this document. Since then, no effort has been made to keep in sync with Geofabrik's definitions.
The layer groups table (
pgosm.layer_group) defines the broad classifications
using a range of 4-character codes. The layer detail table (
defines how to identify the individual codes that make up the large layer groups.
The broad layer groups classify data logically and define the tables to be generated from the source OSM data. The main elements of the layer group data are:
- Code range
- Class name
- OSM Columns
Example layer group
To better understand how this first part works let's look at the
layer_group for buildings. This
INSERT statement adds the row for the group.
INSERT INTO pgosm.layer_group (code_start, code_end, class, osm_columns, description) VALUES ('1500','1599','building', 'osm_id, "addr:housename" AS housename, "addr:housenumber" AS housenumber, building, name, place, COALESCE(tags -> ''building:levels'', ''1'') AS levels, COALESCE(tags -> ''building:height'', ''4'') AS height, tags, way','') ;
The code range for the building layer is
1599. This range
is used to define which records from
to the layer group.
I plan to upgrade the PgOSM schema to use Postgres' range data type.
class column is used to provide the base table name
building_) of the generated table for the group.
The second part of the table name is determined by the data defined by
pgosm.layer_detail records discussed later.
osm_columns column defines which columns to include in the target table.
This must be a valid snippet of SQL code and is used to generate the
clause of the statement to generate the reduced layer.
Remember, OSM data was loaded via
hstoreoption so there is data available in dedicated columns plus many additional attributes stored in the
This whole block goes into a single column.
'osm_id, "addr:housename" AS housename, "addr:housenumber" AS housenumber, building, name, place, COALESCE(tags -> ''building:levels'', ''1'') AS levels, COALESCE(tags -> ''building:height'', ''4'') AS height, tags, way'
Notice I have worked in some data quality improvements into this query. Two examples:
Goal: Improve column names
"addr:housenumber" AS housenumber shows how to rename poorly named
columns. Why is
addr:housenumber a terrible name? The colon (:)
character in the name requires putting the column in double quotes (") in
every query. The
housenumber name does not have the same requirement.
Goal: Usable data from
Depending on the data you want to access from OpenStreetMap, you may need to look
tags HSTORE column. In the
case of buildings, I want to have the building height available. I do that
with the following code, and add a
COALESCE to set a default building height
of 4 meters when it isn't set in the data.
COALESCE(tags -> ''building:height'', ''4'') AS height
Note the double single quotes (
') above are required for proper escaping to
Layer details are sub-categories of the table's main group. These
are setup to be similar to GIS
The layer detail defines the expected spatial types (point/line/polygon)
WHERE clause used to identify the proper rows.
Example layer details
The two following code examples illustrate how the
pgosm.layer_group. The first block adds a detail layer with the code
to represent buildings as a polygon layer and identifies those rows with the
building IS NOT NULL.
INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1500','building','FALSE','FALSE','TRUE','building IS NOT NULL','');
They second detail layer is a point layer with code
1551 for building entrances.
Notice the filter added to
osm_tag_filter is far more complex than the prior
INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1551','entrance','TRUE','FALSE','FALSE','building = ''entrance'' OR (tags -> ''entrance'' IN (''main'',''yes'' ))','');
Create link between tables
pgosm.layer_detail table has a
NULL-able foreign key to
This command will update the layer to set the foreign key values.
UPDATE pgosm.layer_detail ld SET layer_group_id = l.layer_group_id FROM pgosm.layer_group l WHERE ld.code BETWEEN l.code_start AND l.code_end ;
I remember writing this code back in 2015 thinking I would come back and make so many things better, cleaner and "the right way." The step above is a perfect example...
A practical example
Copy and paste the following code into a file named
This code creates three groups, each with one detail layer filtered for county
INSERT query adds the three groups, the important
difference is how the geometry column is handled.
In the first one (code 1106) we just
way as-is. The second (code 1150) simplifies using a tolerance of 10, and the third (code 1151) simplifies with a tolerance of 100. The final query handles
establishing the relationship between the two.
INSERT INTO pgosm.layer_group (code_start, code_end, class, osm_columns, description) VALUES ('1106','1106','county','osm_id, name, admin_level, boundary, way',''), ('1150','1150','county_thematic10','osm_id, name, admin_level, boundary, ST_Simplify(way, 10) AS way',''), ('1151','1151','county_thematic100','osm_id, name, admin_level, boundary, ST_Simplify(way, 100) AS way','') ; INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1106','county','FALSE','FALSE','TRUE', 'boundary=''administrative'' AND admin_level=''6''',''), ('1150','county','FALSE','FALSE','TRUE', 'boundary=''administrative'' AND admin_level=''6''',''), ('1151','county','FALSE','FALSE','TRUE', 'boundary=''administrative'' AND admin_level=''6''','') ; UPDATE pgosm.layer_detail ld SET layer_group_id = l.layer_group_id FROM pgosm.layer_group l WHERE ld.code BETWEEN l.code_start AND l.code_end ;
In a clean load of
pgosm database, load the layer definition data and finish
the process as described in my prior post.
psql -d pgosm -f pgosm_county_reduction_definition.sql
When the process is completed there are three (3) tables in the osm schema.
\dt+ osm. List of relations ┌────────┬────────────────────────────┬───────┬──────────────┬────────┬─────────────┐ │ Schema │ Name │ Type │ Owner │ Size │ Description │ ╞════════╪════════════════════════════╪═══════╪══════════════╪════════╪═════════════╡ │ osm │ county_polygon │ table │ postgres │ 528 kB │ │ │ osm │ county_thematic100_polygon │ table │ postgres │ 144 kB │ │ │ osm │ county_thematic10_polygon │ table │ postgres │ 328 kB │ │ └────────┴────────────────────────────┴───────┴──────────────┴────────┴─────────────┘ (3 rows)
pg_dump to create raw dumps of these individual tables. They are
available for download in the table below.
|File||Size (compressed)||Size (uncompressed)||md5sum|
|Original polygon||417 kB||1.4MB||Download checksum|
|Simple-10 polygon||226 kB||777kB||Download checksum|
|Simple-100 polygon||63 kB||242kB||Download checksum|
This post explained the logic behind the data loaded to the PgOSM database during processing and how to customize the data. I finished by showing an example of how new layer sets can be created to simplify the data.
It has been a lot of fun getting this project out into the wild. There are still
plenty of rough edges, but the process of sharing and writing about it has
me to continue to make the project better for my own purposes and benefit.
I hope others will find this project helpful and enable them to more effectively use OpenStreetMap data in PostGIS.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!
Published February 21, 2019
Last Updated February 21, 2019