RustProof Labs: blogging for education (logo)

Find your local SRID in PostGIS

By Ryan Lambert -- Published November 04, 2020

The past few weeks I had been tossing around some ideas that resulted in me looking for a particular data set. I needed to get the bounding boxes for the most commonly used SRIDs (Spatial Reference IDentifier) in PostGIS to join with the public.spatial_ref_sys table. My hope was to be able to use the data to quickly identify local SRIDs for geometries spreading across the U.S. This data was needed to support another idea where I want both accurate spatial calculations and the best possible performance when working with large OpenStreetMap data sets.

The good news is now I have the exact data I was looking for. The unexpected bonus is that there is a much broader use case for this data in providing an easy way to find which SRIDs might be appropriate for a specific area!

This post explores this new data with an example of how to use it with pre-existing spatial data.

Continue Reading

Postgres 13 Performance with OpenStreetMap data

By Ryan Lambert -- Published October 19, 2020

With Postgres 13 released recently, a new season of testing has begun! I recently wrote about the impact of BTree index deduplication, finding that improvement is a serious win. This post continues looking at Postgres 13 by examining performance through a few steps of an OpenStreetMap data (PostGIS) workflow.

Reasons to upgrade

Performance appears to be a strong advantage to Postgres 13 over Postgres 12. Marc Linster wrote there's "not one headline-grabbing feature, but rather a wide variety of improvements along with updates to previously released features." I am finding that to be an appropriate description. At this point I intend to upgrade our servers for the improved performance, plus a few other cool benefits.

Continue Reading

PostgreSQL 13Beta3: B-Tree index deduplication

By Ryan Lambert -- Published September 06, 2020

PostgreSQL 13 development is coming along nicely, Postgres 13 Beta3 was released on 8/13/2020. The Postgres Beta 1 and 2 releases were released in May and June 2020. One of the features that has my interest in Postgres 13 is the B-Tree deduplication effort. B-Tree indexes are the default indexing method in Postgres, and are likely the most-used indexes in production environments. Any improvements to this part of the database are likely to have wide-reaching benefits. Removing duplication from indexes keeps their physical size smaller, reduces I/O overhead, and should help keep SELECT queries fast!

Continue Reading

PostgreSQL: Integers, on-disk order, and performance

By Ryan Lambert -- Published August 05, 2020

This post examines how the on-disk order of integer data can influence performance in PostgreSQL. When working with relational databases, you often do not need to think about data storage, though there are times when these details can have a noticeable impact on your database's performance.

This post uses PostgreSQL 12.3 on Ubuntu 18.04 on a DigitalOcean droplet with 4 CPU and 8 GB RAM droplet, aka "Rig B" from Scaling osm2pgsql: Process and costs.

Continue Reading

Large Text in PostgreSQL: Performance and Storage

By Ryan Lambert -- Published July 05, 2020

Storing large amounts of data in a single cell in the database has long been a point of discussion. This topic has surfaced in the form of design questions in database projects I have been involved with over the years. Often, it surfaces as a request to store images, PDFs, or other "non-relational" data directly in the database. I was an advocate for storing files on the file system for many, if not all, of those scenarios.

Then, after years of working with PostGIS data I had the realization that much of my vector data that performs so well when properly structured and queried, was larger and more complex than many other blobs of data I had previously resisted. Two years ago I made the decision to store images in a production database using BYTEA. We can guarantee there are a limited number of images with a controlled maximum resolution (limiting size) and a specific use case. There was also the knowledge that caching the images in the frontend would be an easy solution if performance started declining. This system is approaching two years in production with great performance. I am so glad the project has a singular data source: PostgreSQL!

Continue Reading