RustProof Labs: blogging for education (logo)

Find your local SRID in PostGIS

By Ryan Lambert -- Published November 04, 2020

The past few weeks I had been tossing around some ideas that resulted in me looking for a particular data set. I needed to get the bounding boxes for the most commonly used SRIDs (Spatial Reference IDentifier) in PostGIS to join with the public.spatial_ref_sys table. My hope was to be able to use the data to quickly identify local SRIDs for geometries spreading across the U.S. This data was needed to support another idea where I want both accurate spatial calculations and the best possible performance when working with large OpenStreetMap data sets.

The good news is now I have the exact data I was looking for. The unexpected bonus is that there is a much broader use case for this data in providing an easy way to find which SRIDs might be appropriate for a specific area!

This post explores this new data with an example of how to use it with pre-existing spatial data.

Finding the data source

After spending some time looking and asking around, turning up nothing in bulk, the sole source of this information seemed to be https://spatialreference.org. For example, SRID 2772 has this listed at the top of the page:

This is exactly the detail I needed! I double checked these bounds with against my expectations by putting these coordinates into a query to verify this lines up with the area of Northern Colorado I would expect for SRID 2772.

SELECT 2772 AS srid,
    ST_SetSRID(
        ST_MakeBox2D(
            ST_Point(-109.05, 39.56),
            ST_Point( -102.05, 41.00)),
        4326) AS geom
;

Screenshot from DBeaver's spatial viewer showing the bounding box for SRID 2772.  A blue box represents the bounding box polygon and contains Denver toward the lower center, extending North (up) to the Wyoming border and East/West to the Colorado borders.

This post is part of the series PostgreSQL: From Idea to Database.

Load SRID polygons into PostGIS

The SRID bounding box data is available on GitHub in a .sql file for easy loading into PostGIS. The repo also includes a .geojson file for use with other systems. Getting the data into Postgres (with PostGIS already installed) is two simple commands to download and import.

wget https://raw.githubusercontent.com/rustprooflabs/srid-bbox/main/srid_bbox.sql
psql -d pgosm -f srid_bbox.sql

This creates a table with the polygon data and creates a view public.srid_units for exploring SRIDs, their units (meters, feet, decimal degrees) and the bounding box when it is available. A previous version of this view is available as a gist without the SRID bounding box information.

How to Use

To start exploring the data I filter on the point of Fort Collins, Colorado and join to the the public.srid_units view. This returns 21 SRIDs to consider. Most SRIDs (14) have units in meters, the rest are split between decimal degrees (5) and feet (4).

SELECT p.name, s.units, COUNT(*)
    FROM osm_co.place_point p
    INNER JOIN public.srid_units s 
        ON ST_Contains(s.geom, p.way)
    WHERE p.name = 'Fort Collins' AND p.place = 'city'
        AND s.srtext NOT ILIKE '%deprecated%'
    GROUP BY p.name, s.units
    ORDER BY COUNT(*) DESC
;

Results.

┌──────────────┬─────────────────┬───────┐
│     name     │      units      │ count │
╞══════════════╪═════════════════╪═══════╡
│ Fort Collins │ Meters          │    14 │
│ Fort Collins │ Decimal Degrees │     5 │
│ Fort Collins │ Feet            │     4 │
└──────────────┴─────────────────┴───────┘

Knowing that I want units to be in meters, the query can be updated to explore the 14 options in meters a bit closer. Let's see how the bounding boxes relate to the area of interest.

SELECT p.name, s.srid, s.geom_area,
        LEFT(s.srtext, 50) AS srtext, p.way, s.geom 
    FROM osm_co.place_point p
    INNER JOIN public.srid_units s 
        ON ST_Contains(s.geom, p.way)
           AND s.units = 'Meters'
    WHERE p.name = 'Fort Collins'
        AND p.place = 'city'
        AND s.srtext NOT ILIKE '%deprecated%'
;

Obviously we have some worldwide results we probably don't want.

Screenshot the 14 SRIDs with bounding boxes containing Fort Collins, Colorado.  The resulting image is overlapping red boxes, some covering the entire world's extents.

Look closer at the details to continue narrowing down the list to eventually pick a single SRID to use.

┌──────────────┬───────┬────────────────────┬────────────────────────────────────────────────────┐
│     name     │ srid  │     geom_area      │                       srtext                       │
╞══════════════╪═══════╪════════════════════╪════════════════════════════════════════════════════╡
│ Fort Collins │  3410 │                  0 │ PROJCS["NSIDC EASE-Grid Global",GEOGCS["Unspecifie │
│ Fort Collins │  3395 │                  0 │ PROJCS["WGS 84 / World Mercator",GEOGCS["WGS 84",D │
│ Fort Collins │  3503 │  95155903541.62695 │ PROJCS["NAD83(NSRS2007) / Colorado North",GEOGCS[" │
│ Fort Collins │ 26953 │  95155903541.62695 │ PROJCS["NAD83 / Colorado North",GEOGCS["NAD83",DAT │
│ Fort Collins │  2772 │  95155903541.62695 │ PROJCS["NAD83(HARN) / Colorado North",GEOGCS["NAD8 │
│ Fort Collins │  3743 │ 1149852889416.7852 │ PROJCS["NAD83(HARN) / UTM zone 13N",GEOGCS["NAD83( │
│ Fort Collins │  3720 │ 1149852889416.7852 │ PROJCS["NAD83(NSRS2007) / UTM zone 13N",GEOGCS["NA │
│ Fort Collins │ 26713 │  2889512926881.594 │ PROJCS["NAD27 / UTM zone 13N",GEOGCS["NAD27",DATUM │
│ Fort Collins │ 26913 │  2889512926881.594 │ PROJCS["NAD83 / UTM zone 13N",GEOGCS["NAD83",DATUM │
│ Fort Collins │ 32613 │  4227096775764.052 │ PROJCS["WGS 84 / UTM zone 13N",GEOGCS["WGS 84",DAT │
│ Fort Collins │ 32213 │  4227096775764.052 │ PROJCS["WGS 72 / UTM zone 13N",GEOGCS["WGS 72",DAT │
│ Fort Collins │ 32413 │  4227096775764.052 │ PROJCS["WGS 72BE / UTM zone 13N",GEOGCS["WGS 72BE" │
│ Fort Collins │  4956 │   90453823669825.8 │ GEOCCS["NAD83(HARN)",DATUM["NAD83_High_Accuracy_Re │
│ Fort Collins │  4362 │   90453823669825.8 │ GEOCCS["NAD83(HARN) (geocentric)",DATUM["NAD83_Hig │
└──────────────┴───────┴────────────────────┴────────────────────────────────────────────────────┘

No area?

At the top of the list there are two rows with geom_area = 0. That seems odd, further investigation shows that it's due to an oddity due to my choice to calculate geom_area by first transforming to GEOGRAPHY. These bounding boxes range from -180 to 180 and that is known to cause issues. For my use, I don't want those SRIDs anyway so this nuance makes it easy to filter them out with geom_area > 0.

Before doing that, a quick check to verify these polygons indeed do span the globe.

SELECT srid, ST_Area(ST_Transform(geom, 4326)::geography),
        ST_AsText(ST_Transform(geom, 4326))
    FROM srid_units
    WHERE srid = 3395    
;

Notice the +/- 180 in the polygon.

┌──────┬─────────┬─────────────────────────────────────────────────────┐
│ srid │ st_area │                      st_astext                      │
╞══════╪═════════╪═════════════════════════════════════════════════════╡
│ 3395 │       0 │ POLYGON((-180 -80,-180 84,180 84,180 -80,-180 -80)) │
└──────┴─────────┴─────────────────────────────────────────────────────┘

Smaller is (probably) more accurate

Using localized SRIDs help minimize errors in calculations when working with GEOMETRY data. Because the error rate in calculations grows with distance, it is reasonable to assume that SRIDs with smaller bounding boxes will be more accurate in general. That seems like a good enough assumption for me to use as a generalized starting point. When edge cases are encountered, more sophisticated logic can be added into the query being used.

The query below has been updated to omit rows that span the entire globe and to limit to the first five SRIDs sorted by area.

SELECT p.name, s.srid, s.geom_area,
        LEFT(s.srtext, 50) AS srtext, p.way, s.geom 
    FROM osm_co.place_point p
    INNER JOIN public.srid_units s 
        ON ST_Contains(s.geom, p.way)
           AND s.units = 'Meters'
           AND s.geom_area > 0
    WHERE p.name = 'Fort Collins'
        AND p.place = 'city'
        AND s.srtext NOT ILIKE '%deprecated%'
    ORDER BY s.geom_area
    LIMIT 5
;

There are two different sizes of geometries in these results. Three SRIDs (2772, 3503, and 26953) have an area of 9.5 * 10^10 and the other two (3743, 3720) have an area of 1.1 * 10^12. This is a difference of 1.05 * 10^12 m^2, seems significant but just the numbers alone make it harder to grasp.

┌──────────────┬───────┬────────────────────┬────────────────────────────────────────────────────┐
│     name     │ srid  │     geom_area      │                       srtext                       │
╞══════════════╪═══════╪════════════════════╪════════════════════════════════════════════════════╡
│ Fort Collins │ 26953 │  95155903541.62695 │ PROJCS["NAD83 / Colorado North",GEOGCS["NAD83",DAT │
│ Fort Collins │  2772 │  95155903541.62695 │ PROJCS["NAD83(HARN) / Colorado North",GEOGCS["NAD8 │
│ Fort Collins │  3503 │  95155903541.62695 │ PROJCS["NAD83(NSRS2007) / Colorado North",GEOGCS[" │
│ Fort Collins │  3743 │ 1149852889416.7852 │ PROJCS["NAD83(HARN) / UTM zone 13N",GEOGCS["NAD83( │
│ Fort Collins │  3720 │ 1149852889416.7852 │ PROJCS["NAD83(NSRS2007) / UTM zone 13N",GEOGCS["NA │
└──────────────┴───────┴────────────────────┴────────────────────────────────────────────────────┘

What does the difference of 1.05 * 10^12 m^2 look like? We can see the smaller area is targeted within Colorado while the larger area spans from Canada to Mexico.

Screenshot showing difference between the two sizes of bounding boxes identified above.  The smaller region fits entirely within Colorado's borders while the larger region spans from Canada on the north down into Mexico on the south.

Answering questions like this is precisely the reason to use a tool with a spatial viewer, such as DBeaver when working with PostGIS data!

Pick a single SRID

Combining everything learned from the above exploration, this following query should do the trick for providing a decent guess at which SRID to use for a given geometry. This query uses Postgres' nifty SELECT DISTINCT ON (), which is much cooler than your standard SELECT DISTINCT. For our example, DISTINCT ON allows us to return a single SRID for a given place name. The process of selecting which SRID to return is handled by the ORDER BY clause. The geom_area is used to prefer smaller areas, and srid makes and easy tiebreaker to add in that ensures the results are reproducible.

SELECT DISTINCT ON (p.name) name, s.srid
    FROM osm_co.place_point p
    INNER JOIN public.srid_units s 
        ON ST_Contains(s.geom, p.way)
           AND s.units = 'Meters'
           AND s.geom_area > 0
    WHERE p.name = 'Fort Collins'
        AND p.place = 'city'
        AND s.srtext NOT ILIKE '%deprecated%'
    ORDER BY p.name, s.geom_area, s.srid
;

This query for Fort Collins happens to give me the SRID I would have manually chosen for northern Colorado.

┌──────────────┬──────┐
│     name     │ srid │
╞══════════════╪══════╡
│ Fort Collins │ 2772 │
└──────────────┴──────┘

Now to see if expanding the query works the way I want. This query adds two more Colorado cities to the list, Denver and Pueblo.

SELECT DISTINCT ON (p.name) name, s.srid
    FROM osm_co.place_point p
    INNER JOIN public.srid_units s 
        ON ST_Contains(s.geom, p.way)
           AND s.units = 'Meters'
           AND s.geom_area > 0
    WHERE p.name IN ('Fort Collins', 'Denver', 'Pueblo') 
        AND p.place = 'city'
        AND s.srtext NOT ILIKE '%deprecated%'
    ORDER BY p.name, s.geom_area, s.srid
;

Fort Collins and Pueblo come out with the SRIDs I expected. Though I would have picked Denver differently, generally assuming 2773 would be a better fit than 2772.

┌──────────────┬──────┐
│     name     │ srid │
╞══════════════╪══════╡
│ Denver       │ 2772 │
│ Fort Collins │ 2772 │
│ Pueblo       │ 2774 │
└──────────────┴──────┘

On the map we can see that Denver is indeed within 2772, down into the southern portion of the boundary. I suspect that SRID 2773 would be slightly more accurate for Denver calculations, but for my purposes I do not think any differences in the calculations would change the results in a meaningful way. Granted, I'm not using this data to try to deliver a pizza to someone's balcony with a drone either!

Screenshot showing Denver and Fort Collins points (blue) inside the red polygon for SRID 2772, and Pueblo inside SRID 2774.  2772 covers northern Colorado and 2774 covers Southern Colorado.  There is a gap between the two red polygons where SRID 2773 would be if it had been included.

Collecting the data

The srid-bbox GitHub repo has the data as well as the code used to collect and save the polygons to PostGIS. It's a simple Python web scraper that iterates through, extracts the data and saves it to Postgres. The base list of SRID's to query for are from PostGIS's public.spatial_ref_sys table. The following query returned 6,184 potential SRIDs. The web scraper was able to load polygons for 3,775 of them.

SELECT srs.srid
    FROM public.spatial_ref_sys srs
    WHERE srs.auth_name = 'EPSG'
;

The SRIDs with no bounding box either did not have a page on the website I searched, or the retrieved bounding box information generated an error when attempting the INSERT. I made no attempt to try to resolve these issues.

Why not GEOGRAPHY?

A quick suggestion to bypass this whole project might be to consider using GEOGRAPHY. That would provide accurate calculations anywhere in the world. The downside there is considerably slower performance. Jacob Baskin from coord.com found GEOMETRY to be 76% faster than GEOGRAPHY when working with small data sets (< 3,000 rows). I haven't checked the row count for the full production data set yet, but with just Colorado it's nearly 1M rows. A quick comparison with one aggregate query at this scale completed in 1.3 seconds with GEOMETRY while GEOGRAPHY took 26 seconds. This makes GEOMETRY 95% faster for that query!

Note: Jacob wrote "the performance penalty levels off at 4x" when working with 3,000 rows of data. Their raw performance data is not available, but looking at the chart in that post it looks like GEOMETRY returned in 9ms while GEOGRAPHY (with spheriod) took about 37ms. From those times I calculate GEOMETRY is 76% faster((9ms - 37ms) / 37ms = -0.756).

The other reason to use GEOMETRY is there are hundreds of functions in PostGIS to operate on GEOMETRY. For GEOGRAPHY, there are about 30 functions.

Summary

Trying to determine SRID for localized spatial calculations has always been a challenge for me. This challenge was the reason I created the srid_units view a while back, this new side project is one more step in the progression of making it easier. The queries above give a reasonable starting point on how to use spatial joins and filters to work out logic appropriate for a varieties of use cases.

This effort has already proven worthwhile for me! In a matter of a couple hours I was able to take a massive, tedious, manual process and reduce it to a single, simple spatial SQL query. This data set makes it possible to continue working with GEOMETRY, get accurate calculations, and retain some semblance of performance!

The data is out there on the GitHub repo, hopefully others find it useful too!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published November 04, 2020
Last Updated November 04, 2020