Load the Right Amount of OpenStreetMap Data
Populating a PostGIS database with OpenStreetMap data is favorite way to start a new geospatial project. Loading a region of OpenStreetMap data enables you with data ranging from roads, buildings, water features, amenities, and so much more! The breadth and bulk of data is great, but it can turn into a hinderance especially for projects focused on smaller regions. This post explores how to use PgOSM Flex with custom layersets, multiple schemas, and osmium. The goal is load limited data for a larger region, while loading detailed data for a smaller, target region.
The larger region for this post will be the Colorado extract from Geofabrik. The smaller region will be the Fort Collins area, extracted from the Colorado file. The following image shows the data loaded in this post with two maps side-by-side. The minimal data loaded for all of Colorado is shown on the left and the full details of Fort Collins is on the right.
Setup Custom Layersets
The first thing to do is setup custom layersets
for PgOSM Flex. I create the ~/custom-layerset
directory to put the ini
files
into. The first layerset is defined in ~/custom-layerset/large-region.ini
with the following contents. This enables the place
layer and road_major
layers.
[layerset]
place=true
road_major=true
The second layerset is defined in ~/custom-layerset/local-region.ini
. I started
this layerset with the
everything.ini
included with
PgOSM Flex and removed some layers not needed for the project at hand.
[layerset]
amenity=true
building=true
indoor=true
infrastructure=true
landuse=true
leisure=true
natural=true
place=true
public_transport=true
road=true
shop=true
shop_combined_point=true
tags=true
traffic=true
water=true
Run PgOSM Flex
This post uses PgOSM Flex version 0.10.1 to load data to Postgres 15 running locally. The instructions in the external Postgres connection explain how to use the Docker container to load data into a Postgres instance of your choice, bypassing the built-in database.
The docker run
command needs environment variables set, I do so with
this source
command.
source ~/.pgosm-db-pgosm-dev
Run the PgOSM Flex docker image using the environment variables from the
above command. This docker run
command combines details from the
External Postgres
and
Custom Layerset
sections of the documentation.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_USER=$POSTGRES_USER \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-e POSTGRES_HOST=$POSTGRES_HOST \
-e POSTGRES_DB=$POSTGRES_DB \
-e POSTGRES_PORT=$POSTGRES_PORT \
-v ~/custom-layerset:/custom-layerset \
-p 5433:5432 -d rustprooflabs/pgosm-flex
It's always worth checking the Docker container is running with docker ps -a
.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7f80926a823 rustprooflabs/pgosm-flex "docker-entrypoint.s…" 5 seconds ago Up 4 seconds 0.0.0.0:5433->5432/tcp, :::5433->5432/tcp pgosm
I'm presenting a full day pre-conference session titled PostGIS and PostgreSQL: GIS Data, Queries, and Performance at PASS 2023 in November! I hope to see you there!
Load larger region
The following docker exec
command loads the Colorado region using the
minimized layerset large-region
defined above. The command uses
the the new --schema-name
option
added in v0.10.1 to save the data into osm_co
, instead of the default osm
.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=colorado \
--schema-name=osm_co \
--layerset=large-region \
--layerset-path=/custom-layerset/
The above completes on my laptop in 41 seconds to create the 6 tables.
The resulting data in the osm_co
schema is only 50 MB.
Compare this to Colorado with the default
layerset
weighing in with more than 2 GB and taking closer to 10 minutes.
SELECT s_name, table_count, size_plus_indexes
FROM dd.schemas
WHERE s_name LIKE 'osm%'
;
┌────────┬─────────────┬───────────────────┐
│ s_name │ table_count │ size_plus_indexes │
╞════════╪═════════════╪═══════════════════╡
│ osm_co │ 6 │ 50 MB │
└────────┴─────────────┴───────────────────┘
Load local region
The next step is to extract Fort Collins area data from the larger Colorado region.
The osmium
tool
with the --bbox
option makes this easy enough.
cd ~/pgosm-data
osmium extract --bbox=-105.19,40.47,-104.98,40.64 \
-o co-ft-collins.osm.pbf \
colorado-2023-08-24.osm.pbf
I used Geofabrik's tile calculator tool to get the bounding box used above.
Run PgOSM Flex again using --schema-name=osm_foco
and add the --input-file
option
to define the file created from osmium. Using the local-region
layerset
this will load far more data items for Fort Collins compared to what was loaded
for the larger Colorado region.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=colorado/ft-collins \
--input-file=co-ft-collins.osm.pbf \
--pgosm-date=2023-08-24 \
--schema-name=osm_foco \
--layerset=local-region \
--layerset-path=/custom-layerset/
The above command runs in about 30 seconds. Re-running the query against
dd.schemas
shows that the osm_foco
schema has 39 tables and takes almost
70MB on disk.
┌──────────┬─────────────┬───────────────────┐
│ s_name │ table_count │ size_plus_indexes │
╞══════════╪═════════════╪═══════════════════╡
│ osm_co │ 6 │ 50 MB │
│ osm_foco │ 39 │ 69 MB │
└──────────┴─────────────┴───────────────────┘
Let's revisit the image from the beginning of this post. A wide range of OpenStreetMap data is available for the Fort Collins region, while the wider Colorado region has only major roads and place details. This sets up a project for detailed spatial analysis and visualization for the Fort Collins area, while allowing extending visualizations beyond the borders of the detailed data when desired.
Summary
The technique of loading different detail levels for different regions has a wide range of practical uses. Being able to load multiple, targeted subregions with detailed data lets the casual spatial analyst explore without the longer wait times or increased hardware requirements of loading entire larger regions.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!