Find your local SRID in PostGIS
The past few weeks I had been tossing around some ideas that resulted in me
looking for a particular data set. I needed to get the
bounding boxes
for the most commonly used SRIDs
(Spatial Reference IDentifier)
in PostGIS to join with the
public.spatial_ref_sys
table. My hope was to be able to use the data to quickly identify local
SRIDs for geometries spreading across the U.S. This data was needed to support
another idea where I want both accurate spatial calculations and the best possible
performance when working with large OpenStreetMap data sets.
The good news is now I have the exact data I was looking for. The unexpected bonus is that there is a much broader use case for this data in providing an easy way to find which SRIDs might be appropriate for a specific area!
This post explores this new data with an example of how to use it with pre-existing spatial data.
Finding the data source
After spending some time looking and asking around, turning up nothing in bulk, the sole source of this information seemed to be https://spatialreference.org. For example, SRID 2772 has this listed at the top of the page:
- WGS84 Bounds: -109.0500, 39.5600, -102.0500, 41.0000
This is exactly the detail I needed! I double checked these bounds with against my expectations by putting these coordinates into a query to verify this lines up with the area of Northern Colorado I would expect for SRID 2772.
SELECT 2772 AS srid,
ST_SetSRID(
ST_MakeBox2D(
ST_Point(-109.05, 39.56),
ST_Point( -102.05, 41.00)),
4326) AS geom
;
This post is part of the series PostgreSQL: From Idea to Database.
Load SRID polygons into PostGIS
The SRID bounding box data is available on GitHub
in a .sql
file for easy loading into PostGIS.
The repo also includes a .geojson
file for use with other systems.
Getting the data into Postgres (with PostGIS already installed) is two simple
commands to download and import.
wget https://raw.githubusercontent.com/rustprooflabs/srid-bbox/main/srid_bbox.sql
psql -d pgosm -f srid_bbox.sql
This creates a table with the polygon data and creates a view public.srid_units
for exploring SRIDs, their units (meters, feet, decimal degrees) and the bounding
box when it is available. A previous version of this view is available
as a gist
without the SRID bounding box information.
How to Use
To start exploring the data I filter on the point of Fort Collins, Colorado
and join to the the public.srid_units
view. This returns 21
SRIDs to consider. Most SRIDs (14) have units in meters, the rest are split
between decimal degrees (5) and feet (4).
SELECT p.name, s.units, COUNT(*)
FROM osm_co.place_point p
INNER JOIN public.srid_units s
ON ST_Contains(s.geom, p.way)
WHERE p.name = 'Fort Collins' AND p.place = 'city'
AND s.srtext NOT ILIKE '%deprecated%'
GROUP BY p.name, s.units
ORDER BY COUNT(*) DESC
;
Results.
┌──────────────┬─────────────────┬───────┐
│ name │ units │ count │
╞══════════════╪═════════════════╪═══════╡
│ Fort Collins │ Meters │ 14 │
│ Fort Collins │ Decimal Degrees │ 5 │
│ Fort Collins │ Feet │ 4 │
└──────────────┴─────────────────┴───────┘
Knowing that I want units to be in meters, the query can be updated to explore the 14 options in meters a bit closer. Let's see how the bounding boxes relate to the area of interest.
SELECT p.name, s.srid, s.geom_area,
LEFT(s.srtext, 50) AS srtext, p.way, s.geom
FROM osm_co.place_point p
INNER JOIN public.srid_units s
ON ST_Contains(s.geom, p.way)
AND s.units = 'Meters'
WHERE p.name = 'Fort Collins'
AND p.place = 'city'
AND s.srtext NOT ILIKE '%deprecated%'
;
Obviously we have some worldwide results we probably don't want.
Look closer at the details to continue narrowing down the list to eventually pick a single SRID to use.
┌──────────────┬───────┬────────────────────┬────────────────────────────────────────────────────┐
│ name │ srid │ geom_area │ srtext │
╞══════════════╪═══════╪════════════════════╪════════════════════════════════════════════════════╡
│ Fort Collins │ 3410 │ 0 │ PROJCS["NSIDC EASE-Grid Global",GEOGCS["Unspecifie │
│ Fort Collins │ 3395 │ 0 │ PROJCS["WGS 84 / World Mercator",GEOGCS["WGS 84",D │
│ Fort Collins │ 3503 │ 95155903541.62695 │ PROJCS["NAD83(NSRS2007) / Colorado North",GEOGCS[" │
│ Fort Collins │ 26953 │ 95155903541.62695 │ PROJCS["NAD83 / Colorado North",GEOGCS["NAD83",DAT │
│ Fort Collins │ 2772 │ 95155903541.62695 │ PROJCS["NAD83(HARN) / Colorado North",GEOGCS["NAD8 │
│ Fort Collins │ 3743 │ 1149852889416.7852 │ PROJCS["NAD83(HARN) / UTM zone 13N",GEOGCS["NAD83( │
│ Fort Collins │ 3720 │ 1149852889416.7852 │ PROJCS["NAD83(NSRS2007) / UTM zone 13N",GEOGCS["NA │
│ Fort Collins │ 26713 │ 2889512926881.594 │ PROJCS["NAD27 / UTM zone 13N",GEOGCS["NAD27",DATUM │
│ Fort Collins │ 26913 │ 2889512926881.594 │ PROJCS["NAD83 / UTM zone 13N",GEOGCS["NAD83",DATUM │
│ Fort Collins │ 32613 │ 4227096775764.052 │ PROJCS["WGS 84 / UTM zone 13N",GEOGCS["WGS 84",DAT │
│ Fort Collins │ 32213 │ 4227096775764.052 │ PROJCS["WGS 72 / UTM zone 13N",GEOGCS["WGS 72",DAT │
│ Fort Collins │ 32413 │ 4227096775764.052 │ PROJCS["WGS 72BE / UTM zone 13N",GEOGCS["WGS 72BE" │
│ Fort Collins │ 4956 │ 90453823669825.8 │ GEOCCS["NAD83(HARN)",DATUM["NAD83_High_Accuracy_Re │
│ Fort Collins │ 4362 │ 90453823669825.8 │ GEOCCS["NAD83(HARN) (geocentric)",DATUM["NAD83_Hig │
└──────────────┴───────┴────────────────────┴────────────────────────────────────────────────────┘
No area?
At the top of the list there are two rows with geom_area = 0
. That seems odd,
further investigation shows that it's due to an oddity due to my choice to
calculate geom_area
by first transforming to GEOGRAPHY
. These
bounding boxes range from -180 to 180 and that is
known to cause issues. For my use, I don't want those SRIDs
anyway so this nuance makes it easy to filter them out with geom_area > 0
.
Before doing that, a quick check to verify these polygons indeed do span the globe.
SELECT srid, ST_Area(ST_Transform(geom, 4326)::geography),
ST_AsText(ST_Transform(geom, 4326))
FROM srid_units
WHERE srid = 3395
;
Notice the +/- 180
in the polygon.
┌──────┬─────────┬─────────────────────────────────────────────────────┐
│ srid │ st_area │ st_astext │
╞══════╪═════════╪═════════════════════════════════════════════════════╡
│ 3395 │ 0 │ POLYGON((-180 -80,-180 84,180 84,180 -80,-180 -80)) │
└──────┴─────────┴─────────────────────────────────────────────────────┘
Smaller is (probably) more accurate
Using localized SRIDs help minimize errors in calculations when
working with GEOMETRY
data.
Because the error rate in calculations grows with distance, it is reasonable
to assume that SRIDs with smaller bounding boxes will be more accurate in general.
That seems like a good enough assumption for me to use as a generalized starting
point. When edge cases are encountered, more sophisticated logic can be added
into the query being used.
The query below has been updated to omit rows that span the entire globe and to limit to the first five SRIDs sorted by area.
SELECT p.name, s.srid, s.geom_area,
LEFT(s.srtext, 50) AS srtext, p.way, s.geom
FROM osm_co.place_point p
INNER JOIN public.srid_units s
ON ST_Contains(s.geom, p.way)
AND s.units = 'Meters'
AND s.geom_area > 0
WHERE p.name = 'Fort Collins'
AND p.place = 'city'
AND s.srtext NOT ILIKE '%deprecated%'
ORDER BY s.geom_area
LIMIT 5
;
There are two different sizes of geometries in these results. Three SRIDs (2772, 3503, and 26953) have an
area of 9.5 * 10^10
and the other two (3743, 3720) have an area of
1.1 * 10^12
. This is a difference of 1.05 * 10^12 m^2
, seems significant
but just the numbers alone make it harder to grasp.
┌──────────────┬───────┬────────────────────┬────────────────────────────────────────────────────┐
│ name │ srid │ geom_area │ srtext │
╞══════════════╪═══════╪════════════════════╪════════════════════════════════════════════════════╡
│ Fort Collins │ 26953 │ 95155903541.62695 │ PROJCS["NAD83 / Colorado North",GEOGCS["NAD83",DAT │
│ Fort Collins │ 2772 │ 95155903541.62695 │ PROJCS["NAD83(HARN) / Colorado North",GEOGCS["NAD8 │
│ Fort Collins │ 3503 │ 95155903541.62695 │ PROJCS["NAD83(NSRS2007) / Colorado North",GEOGCS[" │
│ Fort Collins │ 3743 │ 1149852889416.7852 │ PROJCS["NAD83(HARN) / UTM zone 13N",GEOGCS["NAD83( │
│ Fort Collins │ 3720 │ 1149852889416.7852 │ PROJCS["NAD83(NSRS2007) / UTM zone 13N",GEOGCS["NA │
└──────────────┴───────┴────────────────────┴────────────────────────────────────────────────────┘
What does the difference of 1.05 * 10^12 m^2
look like? We can see the smaller
area is targeted within Colorado while the larger area spans from Canada to Mexico.
Answering questions like this is precisely the reason to use a tool with a spatial viewer, such as DBeaver when working with PostGIS data!
Pick a single SRID
Combining everything learned from the above exploration, this following query should
do the trick for providing a decent guess at which SRID to use for a given geometry.
This query uses Postgres'
nifty SELECT DISTINCT ON ()
, which is much cooler
than your standard SELECT DISTINCT
.
For our example, DISTINCT ON
allows us to return a single SRID for a given place name. The process of selecting which SRID to return is handled by
the ORDER BY
clause. The geom_area
is used to prefer smaller areas, and srid
makes and easy
tiebreaker to add in that ensures the results are reproducible.
SELECT DISTINCT ON (p.name) name, s.srid
FROM osm_co.place_point p
INNER JOIN public.srid_units s
ON ST_Contains(s.geom, p.way)
AND s.units = 'Meters'
AND s.geom_area > 0
WHERE p.name = 'Fort Collins'
AND p.place = 'city'
AND s.srtext NOT ILIKE '%deprecated%'
ORDER BY p.name, s.geom_area, s.srid
;
This query for Fort Collins happens to give me the SRID I would have manually chosen for northern Colorado.
┌──────────────┬──────┐
│ name │ srid │
╞══════════════╪══════╡
│ Fort Collins │ 2772 │
└──────────────┴──────┘
Now to see if expanding the query works the way I want. This query adds two more Colorado cities to the list, Denver and Pueblo.
SELECT DISTINCT ON (p.name) name, s.srid
FROM osm_co.place_point p
INNER JOIN public.srid_units s
ON ST_Contains(s.geom, p.way)
AND s.units = 'Meters'
AND s.geom_area > 0
WHERE p.name IN ('Fort Collins', 'Denver', 'Pueblo')
AND p.place = 'city'
AND s.srtext NOT ILIKE '%deprecated%'
ORDER BY p.name, s.geom_area, s.srid
;
Fort Collins and Pueblo come out with the SRIDs I expected. Though I would have picked Denver differently, generally assuming 2773 would be a better fit than 2772.
┌──────────────┬──────┐
│ name │ srid │
╞══════════════╪══════╡
│ Denver │ 2772 │
│ Fort Collins │ 2772 │
│ Pueblo │ 2774 │
└──────────────┴──────┘
On the map we can see that Denver is indeed within 2772, down into the southern portion of the boundary. I suspect that SRID 2773 would be slightly more accurate for Denver calculations, but for my purposes I do not think any differences in the calculations would change the results in a meaningful way. Granted, I'm not using this data to try to deliver a pizza to someone's balcony with a drone either!
My book, Mastering PostGIS and OpenStreetMap has a section explaining how to scale this project globally!
Collecting the data
The srid-bbox GitHub repo
has the data as well as the code used to collect and save the polygons
to PostGIS.
It's a simple Python web scraper that iterates through, extracts the data
and saves it to Postgres.
The base list of SRID's to query for are from PostGIS's
public.spatial_ref_sys
table. The following query returned 6,184 potential SRIDs.
The web scraper was able to load polygons for 3,775 of them.
SELECT srs.srid
FROM public.spatial_ref_sys srs
WHERE srs.auth_name = 'EPSG'
;
The SRIDs with no bounding box either did not have a page on the website I searched, or the retrieved bounding box information generated an error when attempting the
INSERT
. I made no attempt to try to resolve these issues.
Why not GEOGRAPHY
?
A quick suggestion to bypass this whole project might be to consider
using GEOGRAPHY
. That would provide accurate calculations anywhere in the world. The downside there is considerably
slower performance. Jacob Baskin from coord.com found GEOMETRY
to be 76% faster than GEOGRAPHY
when working with small data sets (< 3,000 rows). I haven't checked the row count
for the full production data set yet, but with just Colorado it's nearly 1M rows.
A quick comparison with one aggregate query at this scale completed in 1.3
seconds with GEOMETRY
while GEOGRAPHY
took 26 seconds. This makes GEOMETRY
95% faster for that query!
Note: Jacob wrote "the performance penalty levels off at 4x" when working with 3,000 rows of data. Their raw performance data is not available, but looking at the chart in that post it looks like
GEOMETRY
returned in 9ms whileGEOGRAPHY
(with spheroid) took about 37ms. From those times I calculateGEOMETRY
is 76% faster((9ms - 37ms) / 37ms = -0.756
).
The other reason to use GEOMETRY
is there are hundreds of functions in PostGIS
to operate on GEOMETRY
. For GEOGRAPHY
, there are about 30 functions.
Summary
Trying to determine SRID for localized spatial calculations has always been a challenge for me. This challenge was the reason I created the srid_units
view a while back,
this new side project is one more step in the progression of making it easier.
The queries above give a reasonable starting point on how to use spatial joins
and filters to work out logic appropriate for a varieties of use cases.
This effort has already proven worthwhile for me! In a matter of a couple
hours I was able to take a massive, tedious, manual process and
reduce it to a single, simple spatial SQL query. This data set makes it possible
to continue working with GEOMETRY
, get accurate calculations, and retain some
semblance of performance!
The data is out there on the GitHub repo, hopefully others find it useful too!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!