RustProof Labs: blogging for education (logo)
My book Mastering PostGIS and OpenStreetMap is available!

Track performance differences with pg_stat_statements

By Ryan Lambert -- Published May 06, 2023

This is my entry for PgSQL Phriday #008. It's Saturday, so I guess this is a day late! This month's topic, chosen by Michael from pgMustard, is on the excellent pg_stat_statements extension. When I saw Michael was the host this month I knew he'd pick a topic I would want to contribute on! Michael's post for his own topic provides helpful queries and good reminders about changes to columns between Postgres version 12 and 13.

In this post I show one way I like using pg_stat_statements: tracking the impact of configuration changes to a specific workload. I used a contrived change to configuration to quickly make an obvious impact.

Process to test

I am using PgOSM Flex to load Colorado OpenStreetMap data to PostGIS. PgOSM Flex uses a multi-step ETL that prepares the database, runs osm2pgsql, and then runs multiple post-processing steps. This results in 2.4 GB of data into Postgres. That should be enough activity to show something interesting.

Continue Reading

PgOSM Flex for Production OpenStreetMap data

By Ryan Lambert -- Published April 21, 2023

The PgOSM Flex Project is looking forward to the 0.8.0! If you aren't familiar with PgOSM Flex, it is a tool that loads high quality OpenStreetMap datasets to PostGIS using osm2pgsql. I have a few examples of using OpenStreetMap data loaded this way.

I am extremely excited about PgOSM Flex 0.8.0 because the project as a whole is really starting to feel "production ready." While I have been using PgOSM Flex in production for more than 2 years, there have been a few rough edges over that time. However, the improvements over the past year have brought a number of amazing components together.

PgOSM Flex 0.8.0 does include a few ⚠️ breaking changes! ⚠️ Read the release notes for full details.

PgOSM Flex in production

What does "in production" mean for a tool in a data pipeline?

  • Reliable
  • Easy to try out
  • Easy to load/update to prod
  • Low friction software updates

This post covers why I think PgOSM Flex meets all of those requirements.

Continue Reading

Accuracy of Geometry data in PostGIS

By Ryan Lambert -- Published April 15, 2023

A common use case with PostGIS data is to calculate things, such as distances between points, lengths of lines, and the area of polygons. The topic of accuracy, or inaccuracy, with GEOMETRY data comes up often. The most frequent offenders are generic SRIDs such as 3857 and 4326. In some projects accuracy is paramount. Non-negotiable. On the other hand, plenty of projects do not need accurate calculations. Those projects often rely on relationships between calculations, not the actual values of the calculations themselves. If Coffee shop Y is 4 times further away than Coffee shop Z. I'll often go to Coffee shop Z just based on that.

In most cases, users should still understand how significant the errors are. This post explores one approach to determine the how accurate (or not!) the calculations of a given SRID are in a particular region, based on latitude (North/South). The queries used in this post can be adjusted for your specific area.

Set the stage

The calculations in this post focus on the distance of two points situated 40 decimal degrees apart. The points are created in pairs of west/east points at -120 (W) and -80 (W). Those were picked arbitrarily, though intentionally spread far enough apart to make the errors in distance calculations feel obviously significant. The point pairs are created in 5 decimal degree intervals of latitude from 80 North to 80 South. The following screenshot shows how the points frame in much of North America.

While the points on the map using a mercator projection appear to be equidistant... they are not!

Screenshot showing the points used for distance checks at 5 degree latitude intervals.  The west and east points roughly frame most of North America, the exact longitudes chosen were because they were simple round numbers, for for any other relevance.

Continue Reading

Audit Data with Triggers: PGSQL Phriday #007

By Ryan Lambert -- Published April 07, 2023

Welcome to another #PGSQLPhriday post! This month's host is Lætitia Avrot, who picked the topic of Triggers with these questions:

"Do you love them? Do you hate them? Do you sometimes love them sometimes hate them? And, most importantly, why? Do you know legitimate use cases for them? How to mitigate their drawbacks (if you think they have any)?"

Let's dive in!

Triggers are a specialized tool

I rarely use triggers. I don't hate triggers, I just think they should be used sparingly. Like any specialized tool, you should not expect to use triggers for every occasion where they could be used. However... there is one notable use where case I really like triggers: audit tables. Part of the magic of using triggers for auditing data changes in Postgres is the JSON/JSONB support available.

Continue Reading

PGSQL Phriday #005 Recap

By Ryan Lambert -- Published February 10, 2023

Thank you everyone who contributed to PgSQL Phriday #005! This month's topic: "Is your data relational?" If I missed any contributions, or if new ones are published, let me know and I'll try to update this post. These snippets are in a somewhat random order, loosely threaded together by sub-topic.

Contributing posts

Hetti D. wrote a great post starting by addressing the bonus question. I put that question last partly because I have struggled with a succinct definition myself. I also put it last because I hoped the initial 3 questions would lead us to answer the bonus question in our own ways. Hetti also discusses storing blobs and objects, and considerations between complexities and trade-offs with more targeted technology.

Continue Reading

<-- Older Posts          Newer Posts -->