Can you use ltree for Nested Place Data?
The topic of the ltree
data type
has come up a few times recently. This intersects with a common type of query
used in PostGIS: nested geometries. An example of nested geometries
is the state of Colorado exists within the United States. The PgOSM Flex
project calculates and stores
nested polygon data from OpenStreetMap places into a handful of array
(TEXT[]
, BIGINT[]
) columns.
I decided to explore ltree
to see if it would be a suitable option for
PgOSM Flex nested places.
Spoiler alert: ltree
is not suitable for OpenStreetMap data in the way I would
want to use it.
Nested data in arrays
The following is what the "Colorado is in the U.S" would look like using a
Postgres TEXT[]
array:
{"United States","Colorado"}
This is how PgOSM Flex stores nested places currently.
A record like the above can be found using WHERE ARRAY['Colorado'] && name_path
.
My post from 2021 has
examples of queries
like this under the Prepare the nested data section.
Using ltree
To use ltree
, create the extension.
CREATE EXTENSION ltree;
A simple SELECT
query cast to the ltree
type shows basically what you'd expect,
it looks just like the text input. The advantage with the ltree type are all
the operators and functions available.
That functionality is why this approach is tempting.
SELECT 'hello.world'::ltree;
┌─────────────┐
│ ltree │
╞═════════════╡
│ hello.world │
└─────────────┘
The key limitation to ltree
, and why it does not work well with OpenStreetMap data,
are the restrictions on characters. The ltree
labels only accept a limited
range of alphanumeric characters. The documentation states:
Valid alphanumeric character ranges are dependent on the database locale. For example, in C locale, the characters A-Za-z0-9_- are allowed." - Postgres Docs
It didn't take long to realize spaces are not supported. Trying to use "United States"
with the ltree
data type results in a syntax error.
SELECT 'United States.Colorado'::ltree;
ERROR: ltree syntax error at character 7
LINE 1: SELECT 'United States.Colorado'::ltree;
Sure, you can remove spaces and it will work. However, this approach means you need to inject a layer of logic between what the user wants to query and what can be queried.
SELECT 'UnitedStates.Colorado'::ltree;
┌───────────────────────┐
│ ltree │
╞═══════════════════════╡
│ UnitedStates.Colorado │
└───────────────────────┘
The next thing I checked was international support. I was a pleasantly surprised to see this works.
SELECT 'hello.横浜市'::ltree;
┌──────────────┐
│ ltree │
╞══════════════╡
│ hello.横浜市 │
└──────────────┘
While spaces are easy to remove and international support appears to work,
it still isn't that simple. The following query shows one path from Colorado
sanitized of its spaces. Unfortunately, attempting to cast
this to results in the ltree syntax error
message because of the #
symbol.
SELECT replace(array_to_string(name_path, '.', ''), ' ', '')
FROM osm_co.place_polygon_nested
WHERE name like '%Windler Sub #01%'
;
┌──────────────────────────────────┐
│ replace │
╞══════════════════════════════════╡
│ AdamsCounty.Aurora.WindlerSub#01 │
└──────────────────────────────────┘
Again, finding and removing unsupported characters isn't impossible. It certainly isn't desireable, either.
One more pivot
After the special characters, I thought "Certainly the BIGINT
primary key
will provide a way to use ltree
.
This also failed to pan out. PgOSM Flex follows a long standing convention
in OpenStreetMap data loaded to PostGIS of using positive IDs represent the nodes and ways
(points, lines, polygons) and negative IDs represent relations.
This allows representing the full set of data in a single ID space without collision.
Negative BIGINT
s do not work, something like '-1.2.-5'::LTREE
(for id's -1
, 2
, -5
) because of the same ltree syntax error
.
Too many hoops
The type of data nested in OpenStreetMap involves natural language with
diverse characters and BIGINT
s with negative values.
The wide range of characters conflict with the requirements to use ltree
.
It would be technically possible to get there by altering the data to only include approved characters, but that's a rabbit hole I'm simply not interested in. Going down that route would impact every manner of querying the data. You could also create a new key for nesting, but I don't see that adding the extra layers will provide the usability benefits I want from this.
Summary
I started working on this post assuming I'd be spending time doing some performance
comparisons between the data in the current TEXT[]
format against the ltree
format. I even deployed a test database with a the U.S West region for testing!
It was while that data was processing when I discovered that spaces aren't supported.
Not all experiments work out as expected!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!