RustProof Labs: blogging for education (logo)

Improved OpenStreetMap data structure in PostGIS

By Ryan Lambert -- Published January 03, 2021

It was nearly a decade ago when I first loaded OpenStreetMap data to PostGIS. Over the years my fingers have typed osm2pgsql --slim --drop ... countless times and I do not see an end to that trend anytime soon. One thing that is changing is that getting high quality OpenStreetMap data into PostGIS is easier than ever! This improvement in data quality is made possible by the new Flex output available in osm2pgsql 1.4.0, I wrote about my initial impressions of the Flex output a few weeks ago.

This post looks at how I am starting to use osm2pgsql's Flex output to provide a standardized and sanitized OpenStreetMap data set in Postgres/PostGIS. No longer is osm2pgsql limited to loading data to the 3-table structure (planet_osm_point, planet_osm_line and planet_osm_polygon) so I am eagerly converting to the Flex output and taking advantage of these changes! It is also easier than ever to create customized mix-and-match data loads for customized needs of specific projects.

PgOSM-Flex

The PgOSM-Flex repo provides a collection of Lua and SQL scripts to provide high quality OpenStreetMap datasets in PostGIS. PgOSM-Flex leverages the osm2pgsql Flex output with the Lua scripts performing the heavy lifting. The accompanying SQL scripts perform additional post-import steps for adding database constraints, views and calculations not suitable (or possible) in the Lua scripts. As osm2pgsql evolves, PgOSM-Flex will evolve with it.

I hope others will be find this project helpful to kick-start their OpenStreetMap data in PostGIS. If there are layers that should be added, or or existing layers need additional attributes (columns), please submit feedback as an issue or fork the project and create a pull request.

Structural Opinions

Like any properly designed data structure, PgOSM-Flex has opinions! These opinions come in the form of columns with sensible data types, PRIMARY KEY columns, sparse use of JSONB, and other forms of standardization throughout.

A few core opinions:

When possible, the opinions enforced by PgOSM-Flex are informed by the details from the OpenStreetMap Wiki. Efforts are being made to ensure the comments in Postgres describe the expectations. An example of this is the maxspeed column associated with roads layers (highway=*). The maxspeed column stores speed limits in km/hr, and the comments on the column define it as such.

SELECT s_name, t_name, column_name, description
    FROM dd.columns
    WHERE s_name = 'osm' AND t_name = 'road_major'
        AND column_name = 'maxspeed';

┌────────┬────────────┬─────────────┬──────────────────────────────────────────────────────────────┐
│ s_name │   t_name   │ column_name │                         description                          │
╞════════╪════════════╪═════════════╪══════════════════════════════════════════════════════════════╡
│ osm    │ road_major │ maxspeed    │ Maximum posted speed limit in kilometers per hour (km/kr).  …│
│        │            │             │…Units not enforced by OpenStreetMap.  Please fix values in M…│
│        │            │             │…PH in OpenStreetMap.org to either the value in km/hr OR with…│
│        │            │             │… the suffix "mph" so it can be properly converted.  See http…│
│        │            │             │…s://wiki.openstreetmap.org/wiki/Key:maxspeed                 │
└────────┴────────────┴─────────────┴──────────────────────────────────────────────────────────────┘

The query above uses the PgDD Data Dictionary extension. The same general information can be see in psql using \d+ osm.road_major.

Run main PgOSM-Flex load

This section assumes a few pre-requisites not covered in this post:

Change into the flex-config directory of PgOSM-Flex to run osm2pgsql with the run-all.lua style script. The key difference in the command line arguments from pre-Flex versions is the addition of --output=flex --style=./run-all.lua.

cd pgosm-flex/flex-config
osm2pgsql -d pgosm --slim --drop \
    --output=flex --style=./run-all.lua \
    /path/to/data/colorado-latest.osm.pbf

When osm2pgsql completes (about 2 minutes on my laptop), run the post-load SQL script. Each layer has a .sql script to establish PRIMARY KEY columns, comments to document tables and columns, etc.

psql -d pgosm -f run-all.sql

The five (5) largest tables (by row count) for Colorado.

SELECT s_name, t_name, rows, description
    FROM dd.tables
    WHERE s_name = 'osm'
    ORDER BY rows DESC
    LIMIT 5;
┌────────┬──────────────────┬──────────────┬──────────────────────────────────────────────────────────────────┐
│ s_name │      t_name      │     rows     │                           description                            │
╞════════╪══════════════════╪══════════════╪══════════════════════════════════════════════════════════════════╡
│ osm    │ tags             │      3158296 │ OpenStreetMap tag data for all objects in source file.  Key/valu…│
│        │                  │              │…e data stored in tags column in JSONB format.                    │
│ osm    │ road_line        │       793082 │ OpenStreetMap roads, full layer.  Generated by osm2pgsql Flex ou…│
│        │                  │              │…tput using pgosm-flex/flex-config/road.lua                       │
│ osm    │ building_polygon │       787359 │ OpenStreetMap building polygons - all polygons with a building t…│
│        │                  │              │…ag.  Generated by osm2pgsql Flex output using pgosm-flex/flex-co…│
│        │                  │              │…nfig/building.lua                                                │
│ osm    │ natural_point    │       290856 │ OpenStreetMap natural points, e.g. trees, peaks, etc..  Generate…│
│        │                  │              │…d by osm2pgsql Flex output using pgosm-flex/flex-config/natural.…│
│        │                  │              │…lua                                                              │
│ osm    │ traffic_point    │       189048 │ OpenStreetMap traffic related points.  Primarily "highway" tags …│
│        │                  │              │…but includes multiple.  Generated by osm2pgsql Flex output using…│
│        │                  │              │… pgosm-flex/flex-config/traffic.lua                              │
└────────┴──────────────────┴──────────────┴──────────────────────────────────────────────────────────────────┘

All Tags table

The osm.tags table has columns for osm_id and geom_type (Node, Way, Relation), along with the tags column in JSONB. It does not contain the geom column. This can be used to find unstructured data not currently brought over to the main feature tables. This query illustrates a join to the tags table from osm.building_polygon and the additional data it can contain.

SELECT b.osm_id, b.osm_type, b.name, b.height, b.wheelchair,
        jsonb_pretty(t.tags), b.geom
    FROM osm.building_polygon b
    INNER JOIN osm.tags t ON b.osm_id = t.osm_id  AND t.geom_type = 'W'
    WHERE b.osm_id = 25312645
;

┌──────────┬────────────┬────────────┬────────┬────────────┬──────────────────────────────────────┐
│  osm_id  │  osm_type  │    name    │ height │ wheelchair │             jsonb_pretty             │
╞══════════╪════════════╪════════════╪════════╪════════════╪══════════════════════════════════════╡
│ 25312645 │ commercial │ Ball Arena │     22 │ t          │ {                                   ↵│
│          │            │            │        │            │     "ele": "1582",                  ↵│
│          │            │            │        │            │     "name": "Ball Arena",           ↵│
│          │            │            │        │            │     "sport": "basketball;hockey",   ↵│
│          │            │            │        │            │     "height": "22",                 ↵│
│          │            │            │        │            │     "leisure": "stadium",           ↵│
│          │            │            │        │            │     "name:en": "Ball Arena",        ↵│
│          │            │            │        │            │     "name:uk": "Пепсі-центр",       ↵│
│          │            │            │        │            │     "building": "commercial",       ↵│
│          │            │            │        │            │     "old_name": "Pepsi Center",     ↵│
│          │            │            │        │            │     "addr:city": "Denver",          ↵│
│          │            │            │        │            │     "addr:state": "CO",             ↵│
│          │            │            │        │            │     "start_date": "1999",           ↵│
│          │            │            │        │            │     "wheelchair": "yes",            ↵│
│          │            │            │        │            │     "addr:street": "Chopper Circle",↵│
│          │            │            │        │            │     "addr:postcode": "80204",       ↵│
│          │            │            │        │            │     "gnis:feature_id": "2432103",   ↵│
│          │            │            │        │            │     "addr:housenumber": "1000"      ↵│
│          │            │            │        │            │ }                                    │
└──────────┴────────────┴────────────┴────────┴────────────┴──────────────────────────────────────┘
(1 row)

If the tags table is not necessary, use the run-no-tags.lua script instead of the run-all.lua script. This saves a considerable amount of space on disk.

Single layer load

Not everyone needs, or wants, the full data set. instead of using the run-all.lua and .sql script, you can use the style for just a single layer. For example, only load the major roads.

osm2pgsql -d pgosm --slim --drop \
    --output=flex --style=./road_major.lua \
    /path/to/data/colorado-latest.osm.pbf

Run the post-load SQL script.

psql -d pgosm -f road_major.sql

The original road layer with all the footways, hiking trails, and parking aisles contains nearly 800,000 rows. Leaving out the smaller minor roads with the road_major style loads under 80,000 rows.

SELECT t_name, rows, size_plus_indexes, description
        FROM dd.tables
        WHERE s_name = 'osm'
            AND t_name LIKE 'road%'
        ;

┌────────────┬────────┬───────────────────┬───────────────────────────────────────────────────────────────────┐
│   t_name   │  rows  │ size_plus_indexes │                            description                            │
╞════════════╪════════╪═══════════════════╪═══════════════════════════════════════════════════════════════════╡
│ road_line  │ 793082 │ 275 MB            │ OpenStreetMap roads, full layer.  Generated by osm2pgsql Flex out…│
│            │        │                   │…put using pgosm-flex/flex-config/road.lua                         │
│ road_major │  78698 │ 26 MB             │ OpenStreetMap roads - Major only.  Generated by osm2pgsql Flex ou…│
│            │        │                   │…tput using pgosm-flex/flex-config/road_major.lua                  │
└────────────┴────────┴───────────────────┴───────────────────────────────────────────────────────────────────┘

Unitable load

If you know data is missing but need to query around a bit to find it, the unitable.lua style may help. It is like the osm.tags table, but also includes a geom column with every point, line and polygon stuffed into it. It's handy for finding data when you aren't certain exactly what you need, but shouldn't be used for any production-grade purpose.

Final notes

A few final notes that don't really fit, but I don't want to skip them either.


QGIS Style

The QGIS styles are XML data saved in the public.layer_styles table. (Not to be confused with the Flex styles...) Like the original PgOSM projects, PgOSM-Flex includes bundled style data to quickly style data from PostGIS in QGIS. Not all of the legacy styles have been converted yet, some may never be converted. To load the QGIS styles into public.layer_styles table, follow the instructions in the README.


Deep Copy to mix and match styles

My first attempt to try mix-and-match Lua Flex styles fell apart. The trick to getting it to work (again suggested by Jochen Topf!) was to find a Lua method for deep copy. The function deep_copy I have been using is based on copy2 from this GIST by tylerneylon.

-- deep_copy based on copy2: 
function deep_copy(obj)
    if type(obj) ~= 'table' then return obj end
    local res = setmetatable({}, getmetatable(obj))
    for k, v in pairs(obj) do res[deep_copy(k)] = deep_copy(v) end
    return res
end

The deep_copy() function is used with a little bit of logic (below) to allow running the Lua script as one-off styles, or stacking them together like the run-all.lua script does.

if osm2pgsql.process_node == nil then
    -- Change function name here
    osm2pgsql.process_node = building_process_node
else
    local nested = osm2pgsql.process_node
    osm2pgsql.process_node = function(object)
        local object_copy = deep_copy(object)
        nested(object)
        -- Change function name here
        building_process_node(object_copy)
    end
end

Next Steps

The new osm2pgsql Flex output is powerful and enables OpenStreetMap data to be loaded to Postgres with an improved database structure. I need to finish converting the rest of the layers I'll use throughout the year to the new Flex output, and I have a few new layers in mind I want to create! The Lua scripts in place now have room for improvement too, removing duplicated code and taking advantage of the full osm2pgsql processing power.

I am looking forward to the next 10 years of working with OpenStreetMap data in PostGIS!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published January 03, 2021
Last Updated January 03, 2020