My book Mastering PostGIS and OpenStreetMap is available!

Can you use ltree for Nested Place Data?

By Ryan Lambert -- Published February 29, 2024

The topic of the ltree data type has come up a few times recently. This intersects with a common type of query used in PostGIS: nested geometries. An example of nested geometries is the state of Colorado exists within the United States. The PgOSM Flex project calculates and stores nested polygon data from OpenStreetMap places into a handful of array (TEXT[], BIGINT[]) columns. I decided to explore ltree to see if it would be a suitable option for PgOSM Flex nested places.

Spoiler alert: ltree is not suitable for OpenStreetMap data in the way I would want to use it.

Nested data in arrays

The following is what the "Colorado is in the U.S" would look like using a Postgres TEXT[] array:

{"United States","Colorado"}

This is how PgOSM Flex stores nested places currently. A record like the above can be found using WHERE ARRAY['Colorado'] && name_path. My post from 2021 has examples of queries like this under the Prepare the nested data section.

Using ltree

To use ltree, create the extension.

CREATE EXTENSION ltree;

A simple SELECT query cast to the ltree type shows basically what you'd expect, it looks just like the text input. The advantage with the ltree type are all the operators and functions available. That functionality is why this approach is tempting.

SELECT 'hello.world'::ltree;

┌─────────────┐
│    ltree    │
╞═════════════╡
│ hello.world │
└─────────────┘

The key limitation to ltree, and why it does not work well with OpenStreetMap data, are the restrictions on characters. The ltree labels only accept a limited range of alphanumeric characters. The documentation states:

Valid alphanumeric character ranges are dependent on the database locale. For example, in C locale, the characters A-Za-z0-9_- are allowed." - Postgres Docs

It didn't take long to realize spaces are not supported. Trying to use "United States" with the ltree data type results in a syntax error.

SELECT 'United States.Colorado'::ltree;

ERROR:  ltree syntax error at character 7
LINE 1: SELECT 'United States.Colorado'::ltree;

Sure, you can remove spaces and it will work. However, this approach means you need to inject a layer of logic between what the user wants to query and what can be queried.

SELECT 'UnitedStates.Colorado'::ltree;

┌───────────────────────┐
│         ltree         │
╞═══════════════════════╡
│ UnitedStates.Colorado │
└───────────────────────┘

The next thing I checked was international support. I was a pleasantly surprised to see this works.

SELECT 'hello.横浜市'::ltree;

┌──────────────┐
│    ltree     │
╞══════════════╡
│ hello.横浜市  │
└──────────────┘

While spaces are easy to remove and international support appears to work, it still isn't that simple. The following query shows one path from Colorado sanitized of its spaces. Unfortunately, attempting to cast this to results in the ltree syntax error message because of the # symbol.

SELECT replace(array_to_string(name_path, '.', ''), ' ', '')
    FROM osm_co.place_polygon_nested
    WHERE name like '%Windler Sub #01%'
;

┌──────────────────────────────────┐
│             replace              │
╞══════════════════════════════════╡
│ AdamsCounty.Aurora.WindlerSub#01 │
└──────────────────────────────────┘

Again, finding and removing unsupported characters isn't impossible. It certainly isn't desireable, either.

One more pivot

After the special characters, I thought "Certainly the BIGINT primary key will provide a way to use ltree. This also failed to pan out. PgOSM Flex follows a long standing convention in OpenStreetMap data loaded to PostGIS of using positive IDs represent the nodes and ways (points, lines, polygons) and negative IDs represent relations. This allows representing the full set of data in a single ID space without collision. Negative BIGINTs do not work, something like '-1.2.-5'::LTREE (for id's -1, 2, -5) because of the same ltree syntax error.

Too many hoops

The type of data nested in OpenStreetMap involves natural language with diverse characters and BIGINTs with negative values. The wide range of characters conflict with the requirements to use ltree.

It would be technically possible to get there by altering the data to only include approved characters, but that's a rabbit hole I'm simply not interested in. Going down that route would impact every manner of querying the data. You could also create a new key for nesting, but I don't see that adding the extra layers will provide the usability benefits I want from this.

Summary

I started working on this post assuming I'd be spending time doing some performance comparisons between the data in the current TEXT[] format against the ltree format. I even deployed a test database with a the U.S West region for testing! It was while that data was processing when I discovered that spaces aren't supported.

Not all experiments work out as expected!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published February 29, 2024
Last Updated February 29, 2024