Postgres 15: Explain Buffer now with Temp Timings
This post explores a helpful addition to Postgres 15's EXPLAIN output when
using BUFFERS.
The Postgres 15 release notes
mention lists this item:
- Add
EXPLAIN (BUFFERS)output for temporary file block I/O (Masahiko Sawada)
This improvement adds new detail to the output provided from Postgres 15 when running
EXPLAIN (ANALYZE, BUFFERS) <query>. This post explores this feature along with
a couple different ways the reported I/O timing interacts with performance tuning.
Getting the feature
The first thing you need is to be using at least Postgres 15.
Your instance also needs to have track_io_timing=on in your Postgres configuration
file, postgresql.conf.
Check the value of this setting with SHOW track_io_timing;.
SHOW track_io_timing;
┌─────────────────┐
│ track_io_timing │
╞═════════════════╡
│ on │
└─────────────────┘
Test Data and Server
This post used a Postgres 15 instance on a
Digital Ocean droplet
with 2 AMD CPU and 2 GB RAM. I loaded Colorado OpenStreetMap data via
PgOSM Flex.
The data loaded to the osm schema
weighs in at 2.5 GB, the public schema, 3.3 GB, has the raw data needed to enable
PgOSM Flex's --replication feature via osm2pgsql-replication. The advantage
to having more data than RAM is it's pretty easy to show I/O timings, which I need
for this post!
Using v4 of the Postgres H3 extension
I wrote about using the H3 extension last year in Using Uber's H3 hex grid in PostGIS and H3 indexes for performance with PostGIS data. Naturally, things have changed over the past 12 months, specifically version 4 of the H3 Postgres extension was released. The H3 Postgres extension (h3-pg) closely follows the upstream H3 project, including naming conventions. The changes made in H3 version 4 unfortunately changed every function name used in my original blog posts. It seems this mass renaming was a one-time alignment in the H3 project, hopefully they don't all get renamed again.
This post covers the changes required to migrate the code in my prior two posts work with version 4.x h3-pg.
Create the h3 extension
Creating the extension for PostGIS usage now involves installing two (2)
extensions. Some components have been split out into the h3_postgis
extension.
I use CASCADE when installing the h3_postgis portion since that also
requires postgis_raster which I do not have installed by default.
CREATE EXTENSION h3;
CREATE EXTENSION h3_postgis CASCADE;
Track performance differences with pg_stat_statements
This is my entry for PgSQL Phriday #008. It's Saturday, so I guess this is a day
late!
This
month's topic,
chosen by Michael from pgMustard,
is on the excellent pg_stat_statements extension.
When I saw Michael was the host this month I knew he'd pick a topic I would want
to contribute on!
Michael's post for his own topic
provides helpful queries and good reminders about changes to columns
between Postgres version 12 and 13.
In this post I show one way I like using pg_stat_statements: tracking the
impact of configuration changes to a specific workload. I used a contrived change
to configuration to quickly make an obvious impact.
Process to test
I am using PgOSM Flex to load Colorado OpenStreetMap data to PostGIS. PgOSM Flex uses a multi-step ETL that prepares the database, runs osm2pgsql, and then runs multiple post-processing steps. This results in 2.4 GB of data into Postgres. That should be enough activity to show something interesting.
PgOSM Flex for Production OpenStreetMap data
The PgOSM Flex Project is looking forward to the 0.8.0! If you aren't familiar with PgOSM Flex, it is a tool that loads high quality OpenStreetMap datasets to PostGIS using osm2pgsql. I have a few examples of using OpenStreetMap data loaded this way.
I am extremely excited about PgOSM Flex 0.8.0 because the project as a whole is really starting to feel "production ready." While I have been using PgOSM Flex in production for more than 2 years, there have been a few rough edges over that time. However, the improvements over the past year have brought a number of amazing components together.
PgOSM Flex 0.8.0 does include a few ⚠️ breaking changes! ⚠️ Read the release notes for full details.
PgOSM Flex in production
What does "in production" mean for a tool in a data pipeline?
- Reliable
- Easy to try out
- Easy to load/update to prod
- Low friction software updates
This post covers why I think PgOSM Flex meets all of those requirements.
Accuracy of Geometry data in PostGIS
A common use case with PostGIS data is to calculate things, such as distances
between points, lengths of lines, and the area of polygons.
The topic of accuracy, or inaccuracy, with GEOMETRY data comes up often.
The most frequent offenders are generic SRIDs such as 3857 and 4326. In some projects
accuracy is paramount. Non-negotiable. On the other hand, plenty of projects
do not need accurate calculations. Those projects often rely on relationships
between calculations, not the actual values of the calculations themselves.
If Coffee shop Y is 4 times further away than Coffee shop Z. I'll often go to
Coffee shop Z just based on that.
In most cases, users should still understand how significant the errors are. This post explores one approach to determine the how accurate (or not!) the calculations of a given SRID are in a particular region, based on latitude (North/South). The queries used in this post can be adjusted for your specific area.
Set the stage
The calculations in this post focus on the distance of two points situated 40 decimal degrees apart. The points are created in pairs of west/east points at -120 (W) and -80 (W). Those were picked arbitrarily, though intentionally spread far enough apart to make the errors in distance calculations feel obviously significant. The point pairs are created in 5 decimal degree intervals of latitude from 80 North to 80 South. The following screenshot shows how the points frame in much of North America.
While the points on the map using a mercator projection appear to be equidistant... they are not!
