Improved OpenStreetMap data structure in PostGIS
It was nearly a decade ago when I first loaded OpenStreetMap data to PostGIS.
Over the years my fingers have typed osm2pgsql --slim --drop ...
countless times
and I do not see an end to that trend anytime soon.
One thing that is changing is that getting high quality OpenStreetMap data into
PostGIS is easier than ever!
This improvement in data quality is made possible by the new Flex output available in osm2pgsql 1.4.0,
I wrote about my initial impressions of the Flex output a few weeks ago.
This post looks at how I am starting to use osm2pgsql's Flex output to provide a
standardized and sanitized OpenStreetMap data set in Postgres/PostGIS.
No longer is osm2pgsql limited to loading data to the 3-table structure
(planet_osm_point
, planet_osm_line
and planet_osm_polygon
)
so I am eagerly converting to the Flex output and taking advantage of these changes!
It is also easier than ever to create customized mix-and-match data loads
for customized needs of specific projects.
PgOSM-Flex
The PgOSM-Flex repo provides a collection of Lua and SQL scripts to provide high quality OpenStreetMap datasets in PostGIS. PgOSM-Flex leverages the osm2pgsql Flex output with the Lua scripts performing the heavy lifting. The accompanying SQL scripts perform additional post-import steps for adding database constraints, views and calculations not suitable (or possible) in the Lua scripts. As osm2pgsql evolves, PgOSM-Flex will evolve with it.
I hope others will be find this project helpful to kick-start their OpenStreetMap data in PostGIS. If there are layers that should be added, or or existing layers need additional attributes (columns), please submit feedback as an issue or fork the project and create a pull request.
Structural Opinions
Like any properly designed data structure, PgOSM-Flex has opinions!
These opinions come in the
form of columns with sensible data types, PRIMARY KEY
columns,
sparse use of JSONB
, and other forms of standardization throughout.
A few core opinions:
- Every table has a
PRIMARY KEY
- More columns get
NOT NULL
- Prefer Proper Columns over
JSONB
- Prefer
JSONB
overHSTORE
(aka good-byeHSTORE
!) - Units cleaned and matching OpenStreetMap default units
- Conversion to other units should happen in Generated Columns, Views or even Materialized Views
When possible, the opinions enforced by PgOSM-Flex are informed by the
details from the OpenStreetMap Wiki. Efforts are being made to ensure the comments in Postgres describe the expectations. An example of this is the maxspeed
column associated with
roads layers (highway=*
). The maxspeed
column stores speed limits in km/hr,
and the comments on the column define it as such.
SELECT s_name, t_name, column_name, description
FROM dd.columns
WHERE s_name = 'osm' AND t_name = 'road_major'
AND column_name = 'maxspeed';
┌────────┬────────────┬─────────────┬──────────────────────────────────────────────────────────────┐
│ s_name │ t_name │ column_name │ description │
╞════════╪════════════╪═════════════╪══════════════════════════════════════════════════════════════╡
│ osm │ road_major │ maxspeed │ Maximum posted speed limit in kilometers per hour (km/kr). …│
│ │ │ │…Units not enforced by OpenStreetMap. Please fix values in M…│
│ │ │ │…PH in OpenStreetMap.org to either the value in km/hr OR with…│
│ │ │ │… the suffix "mph" so it can be properly converted. See http…│
│ │ │ │…s://wiki.openstreetmap.org/wiki/Key:maxspeed │
└────────┴────────────┴─────────────┴──────────────────────────────────────────────────────────────┘
The query above uses the PgDD Data Dictionary extension. The same general information can be see in
psql
using\d+ osm.road_major
.
Run main PgOSM-Flex load
This section assumes a few pre-requisites not covered in this post:
- osm2pgsql 1.4.0+ installed
- A local Postgres database named
pgosm
exists with PostGIS installed lua-dkjson
is installed- PgOSM-Flex is cloned locally
colorado-latest.osm.pbf
downloaded from Geofabrik
Change into the flex-config
directory of PgOSM-Flex to run osm2pgsql with
the run-all.lua
style script.
The key difference in the command line arguments
from pre-Flex versions is the addition of --output=flex --style=./run-all.lua
.
cd pgosm-flex/flex-config
osm2pgsql -d pgosm --slim --drop \
--output=flex --style=./run-all.lua \
/path/to/data/colorado-latest.osm.pbf
When osm2pgsql completes (about 2 minutes on my laptop), run the post-load SQL script. Each layer has
a .sql
script to establish PRIMARY KEY
columns, comments to document
tables and columns, etc.
psql -d pgosm -f run-all.sql
The five (5) largest tables (by row count) for Colorado.
SELECT s_name, t_name, rows, description
FROM dd.tables
WHERE s_name = 'osm'
ORDER BY rows DESC
LIMIT 5;
┌────────┬──────────────────┬──────────────┬──────────────────────────────────────────────────────────────────┐
│ s_name │ t_name │ rows │ description │
╞════════╪══════════════════╪══════════════╪══════════════════════════════════════════════════════════════════╡
│ osm │ tags │ 3158296 │ OpenStreetMap tag data for all objects in source file. Key/valu…│
│ │ │ │…e data stored in tags column in JSONB format. │
│ osm │ road_line │ 793082 │ OpenStreetMap roads, full layer. Generated by osm2pgsql Flex ou…│
│ │ │ │…tput using pgosm-flex/flex-config/road.lua │
│ osm │ building_polygon │ 787359 │ OpenStreetMap building polygons - all polygons with a building t…│
│ │ │ │…ag. Generated by osm2pgsql Flex output using pgosm-flex/flex-co…│
│ │ │ │…nfig/building.lua │
│ osm │ natural_point │ 290856 │ OpenStreetMap natural points, e.g. trees, peaks, etc.. Generate…│
│ │ │ │…d by osm2pgsql Flex output using pgosm-flex/flex-config/natural.…│
│ │ │ │…lua │
│ osm │ traffic_point │ 189048 │ OpenStreetMap traffic related points. Primarily "highway" tags …│
│ │ │ │…but includes multiple. Generated by osm2pgsql Flex output using…│
│ │ │ │… pgosm-flex/flex-config/traffic.lua │
└────────┴──────────────────┴──────────────┴──────────────────────────────────────────────────────────────────┘
All Tags table
The osm.tags
table has columns for osm_id
and geom_type
(N
ode, W
ay, R
elation), along with the tags
column in JSONB. It does not contain the
geom
column. This can be used to find unstructured data
not currently brought over to the main feature tables.
This query illustrates a join to the tags
table from osm.building_polygon
and the additional data it can contain.
SELECT b.osm_id, b.osm_type, b.name, b.height, b.wheelchair,
jsonb_pretty(t.tags), b.geom
FROM osm.building_polygon b
INNER JOIN osm.tags t ON b.osm_id = t.osm_id AND t.geom_type = 'W'
WHERE b.osm_id = 25312645
;
┌──────────┬────────────┬────────────┬────────┬────────────┬──────────────────────────────────────┐
│ osm_id │ osm_type │ name │ height │ wheelchair │ jsonb_pretty │
╞══════════╪════════════╪════════════╪════════╪════════════╪══════════════════════════════════════╡
│ 25312645 │ commercial │ Ball Arena │ 22 │ t │ { ↵│
│ │ │ │ │ │ "ele": "1582", ↵│
│ │ │ │ │ │ "name": "Ball Arena", ↵│
│ │ │ │ │ │ "sport": "basketball;hockey", ↵│
│ │ │ │ │ │ "height": "22", ↵│
│ │ │ │ │ │ "leisure": "stadium", ↵│
│ │ │ │ │ │ "name:en": "Ball Arena", ↵│
│ │ │ │ │ │ "name:uk": "Пепсі-центр", ↵│
│ │ │ │ │ │ "building": "commercial", ↵│
│ │ │ │ │ │ "old_name": "Pepsi Center", ↵│
│ │ │ │ │ │ "addr:city": "Denver", ↵│
│ │ │ │ │ │ "addr:state": "CO", ↵│
│ │ │ │ │ │ "start_date": "1999", ↵│
│ │ │ │ │ │ "wheelchair": "yes", ↵│
│ │ │ │ │ │ "addr:street": "Chopper Circle",↵│
│ │ │ │ │ │ "addr:postcode": "80204", ↵│
│ │ │ │ │ │ "gnis:feature_id": "2432103", ↵│
│ │ │ │ │ │ "addr:housenumber": "1000" ↵│
│ │ │ │ │ │ } │
└──────────┴────────────┴────────────┴────────┴────────────┴──────────────────────────────────────┘
(1 row)
If the
tags
table is not necessary, use therun-no-tags.lua
script instead of therun-all.lua
script. This saves a considerable amount of space on disk.
Single layer load
Not everyone needs, or wants, the full data set. instead of using the
run-all.lua
and .sql
script, you can use the style for just a single layer.
For example, only load the major roads.
osm2pgsql -d pgosm --slim --drop \
--output=flex --style=./road_major.lua \
/path/to/data/colorado-latest.osm.pbf
Run the post-load SQL script.
psql -d pgosm -f road_major.sql
The original road layer with all the footways, hiking trails, and parking aisles
contains nearly 800,000 rows. Leaving out the smaller minor roads with
the road_major
style loads under 80,000 rows.
SELECT t_name, rows, size_plus_indexes, description
FROM dd.tables
WHERE s_name = 'osm'
AND t_name LIKE 'road%'
;
┌────────────┬────────┬───────────────────┬───────────────────────────────────────────────────────────────────┐
│ t_name │ rows │ size_plus_indexes │ description │
╞════════════╪════════╪═══════════════════╪═══════════════════════════════════════════════════════════════════╡
│ road_line │ 793082 │ 275 MB │ OpenStreetMap roads, full layer. Generated by osm2pgsql Flex out…│
│ │ │ │…put using pgosm-flex/flex-config/road.lua │
│ road_major │ 78698 │ 26 MB │ OpenStreetMap roads - Major only. Generated by osm2pgsql Flex ou…│
│ │ │ │…tput using pgosm-flex/flex-config/road_major.lua │
└────────────┴────────┴───────────────────┴───────────────────────────────────────────────────────────────────┘
Unitable load
If you know data is missing but need to query around a bit to find it, the
unitable.lua
style may help. It is like the osm.tags
table, but also includes
a geom
column with every point, line and polygon stuffed into it.
It's handy for finding data when you aren't certain exactly what you need,
but shouldn't be used for any production-grade purpose.
Final notes
A few final notes that don't really fit, but I don't want to skip them either.
QGIS Style
The QGIS styles are XML data saved in the
public.layer_styles
table. (Not to be confused with the Flex styles...)
Like the original PgOSM projects, PgOSM-Flex includes bundled style data to
quickly style data from PostGIS in QGIS.
Not all of the legacy styles have been converted yet, some may never be converted.
To load the QGIS styles into public.layer_styles
table, follow the instructions
in the README.
Deep Copy to mix and match styles
My first attempt to try mix-and-match Lua Flex styles fell apart. The trick
to getting it to work
(again suggested by Jochen Topf!) was to find a Lua method for deep copy. The function deep_copy
I have been using is based on copy2
from this GIST by tylerneylon.
-- deep_copy based on copy2:
function deep_copy(obj)
if type(obj) ~= 'table' then return obj end
local res = setmetatable({}, getmetatable(obj))
for k, v in pairs(obj) do res[deep_copy(k)] = deep_copy(v) end
return res
end
The deep_copy()
function is used with a little bit of logic (below) to allow
running the Lua script as one-off styles, or stacking them together like the run-all.lua
script does.
if osm2pgsql.process_node == nil then
-- Change function name here
osm2pgsql.process_node = building_process_node
else
local nested = osm2pgsql.process_node
osm2pgsql.process_node = function(object)
local object_copy = deep_copy(object)
nested(object)
-- Change function name here
building_process_node(object_copy)
end
end
Next Steps
The new osm2pgsql Flex output is powerful and enables OpenStreetMap data to be loaded to Postgres with an improved database structure. I need to finish converting the rest of the layers I'll use throughout the year to the new Flex output, and I have a few new layers in mind I want to create! The Lua scripts in place now have room for improvement too, removing duplicated code and taking advantage of the full osm2pgsql processing power.
I am looking forward to the next 10 years of working with OpenStreetMap data in PostGIS!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!