PgOSM transformations explained
I wrote about the PgOSM project last month
(GitHub)
showing how to transform OpenStreetMap data into a more relational format.
The goal of this post is to show how you can easily customize the
transformations and reductions to suit your needs.
The prior post used a command to load a layer_definitions.sql
file
(view on GitHub)
under the section Deploy schema and define layers.
psql -d pgosm -f ~/git/pgosm/db/data/layer_definitions.sql
Customizing the data loaded in that step is the key to changing how PgOSM transforms, reduces and organizes your OpenStreetMap data. This post shows you how to do that!
This post is outdated. The PgOSM project was replaced by PgOSM Flex. See Better OpenStreetMap data using PgOSM Flex 0.6.0 for a more recent approach.
History of PgOSM
I came up with the core of this project in 2015 over the course of a weekend or two. Since then it has continued to serve my needs reliably, mostly without updates, ready to load fresh OpenStreetMap data any time I asked. Recently I decided to clean up the project and release it under the MIT license. I've always felt like it's an ugly solution, it's far from perfect... but being reliable and customizable counts for a lot!
When I started this project I was already sourcing my OpenStreetMap data from Geofabrik, and they're kind enough to publish a detailed specification classifying layers. The codes used in the default PgOSM transformation were derived from an older version of this document. Since then, no effort has been made to keep in sync with Geofabrik's definitions.
The layer groups table (pgosm.layer_group
) defines the broad classifications
using a range of 4-character codes. The layer detail table (pgosm.layer_detail
)
defines how to identify the individual codes that make up the large layer groups.
Layer groups
The broad layer groups classify data logically and define the tables to be generated from the source OSM data. The main elements of the layer group data are:
- Code range
- Class name
- OSM Columns
Example layer group
To better understand how this first part works let's look at the
layer_group
for buildings. This INSERT
statement adds the row for the group.
INSERT INTO pgosm.layer_group (code_start, code_end, class, osm_columns, description)
VALUES ('1500','1599','building',
'osm_id, "addr:housename" AS housename, "addr:housenumber" AS housenumber,
building, name, place, COALESCE(tags -> ''building:levels'', ''1'') AS levels,
COALESCE(tags -> ''building:height'', ''4'') AS height, tags, way','')
;
Code range
The code range for the building layer is 1500
to 1599
. This range
is used to define which records from pgosm.layer_detail
relate
to the layer group.
I plan to upgrade the PgOSM schema to use Postgres' range data type.
Class name
The class
column is used to provide the base table name
(e.g. building_
) of the generated table for the group.
The second part of the table name is determined by the data defined by
the pgosm.layer_detail
records discussed later.
OSM Columns
The osm_columns
column defines which columns to include in the target table.
This must be a valid snippet of SQL code and is used to generate the SELECT
clause of the statement to generate the reduced layer.
Remember, OSM data was loaded via
osm2pgsql
with thehstore
option so there is data available in dedicated columns plus many additional attributes stored in thetags
column.
This whole block goes into a single column.
'osm_id, "addr:housename" AS housename, "addr:housenumber" AS housenumber, building, name, place,
COALESCE(tags -> ''building:levels'', ''1'') AS levels, COALESCE(tags -> ''building:height'', ''4'') AS height,
tags, way'
Notice I have worked in some data quality improvements into this query. Two examples:
Goal: Improve column names
The block "addr:housenumber" AS housenumber
shows how to rename poorly named
columns. Why is addr:housenumber
a terrible name? The colon (:)
character in the name requires putting the column in double quotes (") in
every query. The housenumber
name does not have the same requirement.
Goal: Usable data from HSTORE
Depending on the data you want to access from OpenStreetMap, you may need to look
into the tags HSTORE
column. In the
case of buildings, I want to have the building height available. I do that
with the following code, and add a COALESCE
to set a default building height
of 4 meters when it isn't set in the data.
COALESCE(tags -> ''building:height'', ''4'') AS height
Note the double single quotes (
'
) above are required for proper escaping toINSERT
the data.
Layer Detail
Layer details are sub-categories of the table's main group. These
are setup to be similar to GIS
feature classes.
The layer detail defines the expected spatial types (point/line/polygon)
and the WHERE
clause used to identify the proper rows.
Example layer details
The two following code examples illustrate how the pgosm.layer_detail
relates
to pgosm.layer_group
. The first block adds a detail layer with the code 1500
to represent buildings as a polygon layer and identifies those rows with the
filter building IS NOT NULL
.
INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1500','building','FALSE','FALSE','TRUE','building IS NOT NULL','');
They second detail layer is a point layer with code 1551
for building entrances.
Notice the filter added to osm_tag_filter
is far more complex than the prior
building example.
INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1551','entrance','TRUE','FALSE','FALSE','building = ''entrance'' OR (tags -> ''entrance'' IN (''main'',''yes'' ))','');
Create link between tables
The pgosm.layer_detail
table has a NULL
-able foreign key to pgosm.layer_goup
.
This command will update the layer to set the foreign key values.
UPDATE pgosm.layer_detail ld
SET layer_group_id = l.layer_group_id
FROM pgosm.layer_group l
WHERE ld.code BETWEEN l.code_start AND l.code_end
;
I remember writing this code back in 2015 thinking I would come back and make so many things better, cleaner and "the right way." The step above is a perfect example...
A practical example
Copy and paste the following code into a file named
pgosm_county_reduction_definition.sql
.
This code creates three groups, each with one detail layer filtered for county
polygons.
The first INSERT
query adds the three groups, the important
difference is how the geometry column is handled.
In the first one (code 1106) we just
use way
as-is. The second (code 1150) simplifies using a tolerance of 10, and the third (code 1151) simplifies with a tolerance of 100. The final query handles
establishing the relationship between the two.
Why simplify your spatial data? See my Tame your spatial data posts: Part 1; Part 2
INSERT INTO pgosm.layer_group (code_start, code_end, class, osm_columns, description)
VALUES ('1106','1106','county','osm_id, name, admin_level,
boundary, way',''),
('1150','1150','county_thematic10','osm_id, name, admin_level,
boundary, ST_Simplify(way, 10) AS way',''),
('1151','1151','county_thematic100','osm_id, name, admin_level,
boundary, ST_Simplify(way, 100) AS way','')
;
INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon,
osm_tag_filter, description)
VALUES ('1106','county','FALSE','FALSE','TRUE',
'boundary=''administrative'' AND admin_level=''6''',''),
('1150','county','FALSE','FALSE','TRUE',
'boundary=''administrative'' AND admin_level=''6''',''),
('1151','county','FALSE','FALSE','TRUE',
'boundary=''administrative'' AND admin_level=''6''','')
;
UPDATE pgosm.layer_detail ld
SET layer_group_id = l.layer_group_id
FROM pgosm.layer_group l
WHERE ld.code BETWEEN l.code_start AND l.code_end
;
In a clean load of pgosm
database, load the layer definition data and finish
the process as described in my prior post.
psql -d pgosm -f pgosm_county_reduction_definition.sql
When the process is completed there are three (3) tables in the osm schema.
\dt+ osm.
List of relations
┌────────┬────────────────────────────┬───────┬──────────────┬────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Size │ Description │
╞════════╪════════════════════════════╪═══════╪══════════════╪════════╪═════════════╡
│ osm │ county_polygon │ table │ postgres │ 528 kB │ │
│ osm │ county_thematic100_polygon │ table │ postgres │ 144 kB │ │
│ osm │ county_thematic10_polygon │ table │ postgres │ 328 kB │ │
└────────┴────────────────────────────┴───────┴──────────────┴────────┴─────────────┘
(3 rows)
Table downloads
I used pg_dump
to create raw dumps of these individual tables. They are
available for download in the table below.
File | Size (compressed) | Size (uncompressed) | md5sum |
---|---|---|---|
Original polygon | 417 kB | 1.4MB | Download checksum |
Simple-10 polygon | 226 kB | 777kB | Download checksum |
Simple-100 polygon | 63 kB | 242kB | Download checksum |
Summary
This post explained the logic behind the data loaded to the PgOSM database during processing and how to customize the data. I finished by showing an example of how new layer sets can be created to simplify the data.
It has been a lot of fun getting this project out into the wild. There are still
plenty of rough edges, but the process of sharing and writing about it has
forced encouraged
me to continue to make the project better for my own purposes and benefit.
I hope others will find this project helpful and enable them to more effectively use OpenStreetMap data in PostGIS.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!