RustProof Labs: blogging for education (logo)

PgDD extension moves to Pgx

By Ryan Lambert -- Published October 08, 2021

Our data dictionary extension, PgDD, has been re-written using the pgx framework in Rust! At this time I have tagged 0.4.0.rc3 and just need to do a bit more testing before the official 0.4.0 release. While I am excited for the news for PgDD, what is more exciting is the pgx framework and the ease it brings to developing Postgres extensions! Getting started with pgx is straightforward and using cargo pgx run makes it simple to build your extension against multiple versions of Postgres.

This post outlines how I came to the decision to use pgx for Postgres extension development.

Progression of PgDD

Before now, PgDD was a raw SQL extension, with that version being an evolution from prior iterations. Shortly after I converted PgDD to a raw SQL extension I wanted it to do more, specifically related to supporting newer features such as generated columns and native partitioning. Supporting new features in new versions of Postgres is a good idea, but I couldn't drop support for older versions at that time either. Using generated columns as an example, the feature was added in Postgres 12 and came along with an update to the pg_catalog.pg_attribute system catalog. In Pg12 and newer, pg_attribute has a column named attgenerated while earlier versions of Postgres do not have that column.

Continue Reading

(Webinar) Routing with PostGIS and OpenStreetMap

By Ryan Lambert -- Published October 04, 2021

I am excited to be presenting Routing with PostGIS and OpenStreetMap at PostgresConf South Africa 2021! The talk is scheduled for Tuesday October 5, 2021, 3:10 PM SAST (7:10 AM MST).

This page has the resources used during this session.

Downloads for session

Scripts used for the demo:

Continue Reading

Psycopg3 Initial Review

By Ryan Lambert -- Published September 07, 2021

If you use Postgres and Python together you are almost certainly familiar with psycopg2. Daniele Varrazzo has been the maintainer of the psycopg project for many years. In 2020 Daniele started working full-time on creating psycopg3, the successor to psycopg2. Recently, the Beta 1 release of psycopg3 was made available via PyPI install. This post highlights two pieces of happy news with psycopg3:

  • Migration is easy
  • The connection pool rocks

As the first section shows, migration from psycopg2 to psycopg3 is quite easy. The majority of this post is dedicated to examining pyscopg3's connection pool and the difference this feature can make to your application's performance.

Migration

Easy migration is an important feature to encourage developers to upgrade. It is frustrating when a "simple upgrade" turns into a cascade of error after error throughout your application. Luckily for us, psycopg3 got this part right! In the past week I fully migrated two projects to psycopg3 and started migrating two more projects. So far the friction has been very low and confined to edge case uses.

The following example shows a simplified example of how my projects have used psycopg2.

Continue Reading

Identify OpenStreetMap changes with Postgres

By Ryan Lambert -- Published August 30, 2021

The data in the main OpenStreetMap database is constantly changing. Folks around the world are almost certainly saving changes via JOSM, iD, and other editors as you read these words. With change constantly occurring in the data, it is often desirable to have an idea of what has actually changed in the data. This post explores one approach to tracking changes to the tags attribute data once it has been loaded to Postgres.

The topic of this post surfaced while I was working on refreshing a project involving travel times (routing). In the process I noticed a few instances where the analysis had shifted significantly. My first hunch was that entire segments of road had been added or removed, but that was not the cause. It became apparent that tags in the area had been improved. It was easy to specifically point to the value associated with the highway key but I also knew there were other changes happening, I just wasn't sure what all was involved and at what scale.

Calculate tag hash

The database I am working in has five (5) Colorado snapshots loaded spanning back to 2018. The tags data is loaded to a table named osmts.tags, read my post Timescale, Compression and OpenStreetMap Tags for how this table was created. The tags table has one row for every OpenStreetMap feature and stores the full key/value attribute data in a JSONB column (osmts.tags.tags). A relatively simple way to detecting change in data is to calculate the hash for each feature's key/value data. Comparing hashes for any change will identify rows that had changes to their attribute data.

Continue Reading

Timescale, Compression and OpenStreetMap Tags

By Ryan Lambert -- Published August 20, 2021

This post captures my initial exploration with the Timescale DB extension in Postgres. I have watched Timescale with interest for quite some time but had not really experimented with it before now. I am considering Timescale as another solid option for improving my long-term storage of OpenStreetMap data snapshots. Naturally, I am using PostGIS enabled databases filled with OpenStreetMap data.

I started looking at restructuring our OpenStreetMap data with my post Why Partition OpenStreetMap data? That post has an overview of the historic use case I need to support. While my 1st attempt at declarative partitioning ran into a snag, my 2nd attempt worked rather well. This post looks beyond my initial requirements for the project and establishes additional benefits from adding Timescale into our databases.

Timescale benefits

There are two main reasons I am looking into Timescale as an option over Postgres' built-in declarative partitioning:

  • No need to manually create partitions
  • Compression is tempting

New partitions with Postgres' declarative partitioning must be created manually. The syntax isn't terribly tricky and the process can be automated, but it still exists therefore it still needs to be managed. When using Timescale's hypertables new partitions are handled behind the scenes without my direct intervention. The other temptation from Timescale is their columnar-style compression on row-based data. In standard Postgres, the only time compression kicks in is at the row level when a single row will exceed a specified size (default 2kb). See my post on large text data in Postgres that discusses compression in Postgres. Timescale has been writing about their compression so I figured it was time to give it a go. While compression wasn't one of the original goals I had outlined... it would be nice!!

Continue Reading

<-- Older Posts          Newer Posts -->