RustProof Labs: blogging for education (logo)
My book Mastering PostGIS and OpenStreetMap is available!

PgOSM transformations explained

By Ryan Lambert -- Published February 21, 2019

I wrote about the PgOSM project last month (GitHub) showing how to transform OpenStreetMap data into a more relational format. The goal of this post is to show how you can easily customize the transformations and reductions to suit your needs. The prior post used a command to load a layer_definitions.sql file (view on GitHub) under the section Deploy schema and define layers.

psql -d pgosm -f ~/git/pgosm/db/data/layer_definitions.sql

Customizing the data loaded in that step is the key to changing how PgOSM transforms, reduces and organizes your OpenStreetMap data. This post shows you how to do that!

This post is outdated. The PgOSM project was replaced by PgOSM Flex. See Better OpenStreetMap data using PgOSM Flex 0.6.0 for a more recent approach.

History of PgOSM

I came up with the core of this project in 2015 over the course of a weekend or two. Since then it has continued to serve my needs reliably, mostly without updates, ready to load fresh OpenStreetMap data any time I asked. Recently I decided to clean up the project and release it under the MIT license. I've always felt like it's an ugly solution, it's far from perfect... but being reliable and customizable counts for a lot!

When I started this project I was already sourcing my OpenStreetMap data from Geofabrik, and they're kind enough to publish a detailed specification classifying layers. The codes used in the default PgOSM transformation were derived from an older version of this document. Since then, no effort has been made to keep in sync with Geofabrik's definitions.

The layer groups table (pgosm.layer_group) defines the broad classifications using a range of 4-character codes. The layer detail table (pgosm.layer_detail) defines how to identify the individual codes that make up the large layer groups.

Layer groups

The broad layer groups classify data logically and define the tables to be generated from the source OSM data. The main elements of the layer group data are:

Example layer group

To better understand how this first part works let's look at the layer_group for buildings. This INSERT statement adds the row for the group.

INSERT INTO pgosm.layer_group (code_start, code_end, class, osm_columns, description)
    VALUES ('1500','1599','building',
         'osm_id, "addr:housename" AS housename, "addr:housenumber" AS housenumber,
         building, name, place, COALESCE(tags -> ''building:levels'', ''1'') AS levels,
         COALESCE(tags -> ''building:height'', ''4'') AS height, tags, way','')
;

Code range

The code range for the building layer is 1500 to 1599. This range is used to define which records from pgosm.layer_detail relate to the layer group.

I plan to upgrade the PgOSM schema to use Postgres' range data type.

Class name

The class column is used to provide the base table name (e.g. building_) of the generated table for the group. The second part of the table name is determined by the data defined by the pgosm.layer_detail records discussed later.

OSM Columns

The osm_columns column defines which columns to include in the target table. This must be a valid snippet of SQL code and is used to generate the SELECT clause of the statement to generate the reduced layer.

Remember, OSM data was loaded via osm2pgsql with the hstore option so there is data available in dedicated columns plus many additional attributes stored in the tags column.

This whole block goes into a single column.

'osm_id, "addr:housename" AS housename, "addr:housenumber" AS housenumber, building, name, place,
COALESCE(tags -> ''building:levels'', ''1'') AS levels, COALESCE(tags -> ''building:height'', ''4'') AS height, 
tags, way'

Notice I have worked in some data quality improvements into this query. Two examples:

Goal: Improve column names

The block "addr:housenumber" AS housenumber shows how to rename poorly named columns. Why is addr:housenumber a terrible name? The colon (:) character in the name requires putting the column in double quotes (") in every query. The housenumber name does not have the same requirement.

Goal: Usable data from HSTORE

Depending on the data you want to access from OpenStreetMap, you may need to look into the tags HSTORE column. In the case of buildings, I want to have the building height available. I do that with the following code, and add a COALESCE to set a default building height of 4 meters when it isn't set in the data.

COALESCE(tags -> ''building:height'', ''4'') AS height

Note the double single quotes (') above are required for proper escaping to INSERT the data.

Layer Detail

Layer details are sub-categories of the table's main group. These are setup to be similar to GIS feature classes. The layer detail defines the expected spatial types (point/line/polygon) and the WHERE clause used to identify the proper rows.

Example layer details

The two following code examples illustrate how the pgosm.layer_detail relates to pgosm.layer_group. The first block adds a detail layer with the code 1500 to represent buildings as a polygon layer and identifies those rows with the filter building IS NOT NULL.

INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1500','building','FALSE','FALSE','TRUE','building IS NOT NULL','');

They second detail layer is a point layer with code 1551 for building entrances. Notice the filter added to osm_tag_filter is far more complex than the prior building example.

INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, osm_tag_filter, description) VALUES ('1551','entrance','TRUE','FALSE','FALSE','building = ''entrance'' OR (tags -> ''entrance'' IN (''main'',''yes'' ))','');

Create link between tables

The pgosm.layer_detail table has a NULL-able foreign key to pgosm.layer_goup. This command will update the layer to set the foreign key values.

UPDATE pgosm.layer_detail ld
    SET layer_group_id = l.layer_group_id
    FROM pgosm.layer_group l
        WHERE ld.code BETWEEN l.code_start AND l.code_end
;

I remember writing this code back in 2015 thinking I would come back and make so many things better, cleaner and "the right way." The step above is a perfect example...

A practical example

Copy and paste the following code into a file named pgosm_county_reduction_definition.sql.

This code creates three groups, each with one detail layer filtered for county polygons. The first INSERT query adds the three groups, the important difference is how the geometry column is handled. In the first one (code 1106) we just use way as-is. The second (code 1150) simplifies using a tolerance of 10, and the third (code 1151) simplifies with a tolerance of 100. The final query handles establishing the relationship between the two.

Why simplify your spatial data? See my Tame your spatial data posts: Part 1; Part 2

INSERT INTO pgosm.layer_group (code_start, code_end, class, osm_columns, description)
    VALUES ('1106','1106','county','osm_id, name, admin_level,
                boundary,  way',''),
            ('1150','1150','county_thematic10','osm_id, name, admin_level,
                boundary,  ST_Simplify(way, 10) AS way',''),
            ('1151','1151','county_thematic100','osm_id, name, admin_level,
                boundary,  ST_Simplify(way, 100) AS way','')
;

INSERT INTO pgosm.layer_detail (code, subclass, geom_point, geom_line, geom_polygon, 
        osm_tag_filter, description)
    VALUES ('1106','county','FALSE','FALSE','TRUE',
                'boundary=''administrative'' AND admin_level=''6''',''),
            ('1150','county','FALSE','FALSE','TRUE',
                'boundary=''administrative'' AND admin_level=''6''',''),
            ('1151','county','FALSE','FALSE','TRUE',
                'boundary=''administrative'' AND admin_level=''6''','')
;

UPDATE pgosm.layer_detail ld
    SET layer_group_id = l.layer_group_id
    FROM pgosm.layer_group l
        WHERE ld.code BETWEEN l.code_start AND l.code_end
;

In a clean load of pgosm database, load the layer definition data and finish the process as described in my prior post.

psql -d pgosm -f pgosm_county_reduction_definition.sql

When the process is completed there are three (3) tables in the osm schema.

\dt+ osm.
                                  List of relations
┌────────┬────────────────────────────┬───────┬──────────────┬────────┬─────────────┐
│ Schema │            Name            │ Type  │    Owner     │  Size  │ Description │
╞════════╪════════════════════════════╪═══════╪══════════════╪════════╪═════════════╡
│ osm    │ county_polygon             │ table │ postgres     │ 528 kB │             │
│ osm    │ county_thematic100_polygon │ table │ postgres     │ 144 kB │             │
│ osm    │ county_thematic10_polygon  │ table │ postgres     │ 328 kB │             │
└────────┴────────────────────────────┴───────┴──────────────┴────────┴─────────────┘
(3 rows)

Table downloads

I used pg_dump to create raw dumps of these individual tables. They are available for download in the table below.

File Size (compressed) Size (uncompressed) md5sum
Original polygon 417 kB 1.4MB Download checksum
Simple-10 polygon 226 kB 777kB Download checksum
Simple-100 polygon 63 kB 242kB Download checksum

Summary

This post explained the logic behind the data loaded to the PgOSM database during processing and how to customize the data. I finished by showing an example of how new layer sets can be created to simplify the data.

It has been a lot of fun getting this project out into the wild. There are still plenty of rough edges, but the process of sharing and writing about it has forced encouraged me to continue to make the project better for my own purposes and benefit.

I hope others will find this project helpful and enable them to more effectively use OpenStreetMap data in PostGIS.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published February 21, 2019
Last Updated January 22, 2023