My book Mastering PostGIS and OpenStreetMap is available!

Postgres and Software Development: PGSQL Phriday #004

By Ryan Lambert -- Published January 06, 2023

This blog post is part of PGSQL Phriday #004. Hettie Dombrovskaya is this month's host! I was very excited to see the topic chosen as Postgres and Software Development. My contribution to this month's #pgsqlphriday topic covers how I manage code through our development processes. Check out Hettie's post for more about this month's topic.

Types of code

Before continuing on to the specific questions for the challenge, I want to define the broad groupings of code I work.

Mission critical
Not trivial
Trivial

Mission critical code is where most of the "special sauce" is at. Mission critical SQL code includes DDL commands to create the database structure such as CREATE TABLE foo and CREATE VIEW baz. This level of code represents the data structure that enables everything else to function.

Trivial SQL code is at the other end of the spectrum compared to the mission critical code. Trivial SQL code are queries anyone familiar with the database could craft in 5 minutes or less. Simple selects, filtering and aggregates come to mind here. The "not trivial" code is everything between trivial and mission critical. Queries with complex filters, window functions, or multiple steps would classify as not trivial.

With these classifications for our code in place we can go back to the questions!

SQL, source control, and testing

Question: Do you store your SQL code in GitHub (or other source control, e.g. GitLab)?

Yes! All mission critical SQL I am involved with is saved in a Git repo, either public or private depending on the project. The use of source control for mission critical code (SQL, Python, etc.) is non-negotiable. A good portion of my "not trivial" code is also stored in source control, with my trend leaning towards more code in source control. The trivial code isn't worth the effort of putting it into source control or the bloat it creates in those projects.

Question: Do you use pgTAP? Do you think it’s a good practice to have pgTAP tests in your repo, or does it not add value?

I have not used pgTAP and know very little about it, other than it exists. That isn't to say pgTAP isn't helpful, just that it covers functionality that I have covered through other tools. If you generalize the question to "Do you think it's a good practice to have tests in your repo?" my answer is a definite Yes! The tool isn't important, the culture of testing your code is.

I use Sqitch to deploy mission critical code through migrations. Sqitch provides the option to use verify scripts that can handle many of the types of testing pgTAP would provide for me. For testing specific database functionality with data involved, I approach that through the applications themselves. My approach has been to make it trivial to setup test databases with known data and validate database functionality from the app. This approach means for application testing I'm not wasting time mocking database connections and objects. I just... use the database. This ensure functionality work as expected through the entire stack.

Helper scripts and giving back

Two of Hettie's questions relate to helper scripts and giving back.

Do you have any of your own scripts which make your life as a DBA/Database developer/consultant easier?
What do they do? Do you share them with others, or are they hidden in your secret toolbox?

Yes, of course I have a collection of scripts. The vast majority of the helpful code I've written are available to the public for free in one form or another. The longest standing form I've used to share is the RustProof Labs blog. Another way I have shared helpful code back to the community is through open source projects. These two avenues for sharing are explored next.

Sharing via blog posts

I started our blog in July 2013 🤯 and am approaching 200 published posts; this post will be #170! A good portion of the posts I have published over the years are basically collections of scripts with detailed explanations between the queries.

Want to know how I created the following animated image using PostGIS, trajectories, and QGIS? If so, PostGIS Trajectory: Space plus Time is the post for you! That blog post is an example of some "not trivial" code.

Animiated GIF image showing the PostGIS routes animated within QGIS using the Temporal Controller feature.

A big part of what I do as a consultant is make a recommendations. I like to back those up with something more concrete than a conversation and a hunch. For example, when I recommend to upgrade an instance to at least Postgres 13 to reduce the databases' B-Tree index size, I reference the TLDR section here. That post shows how much benefit later Postgres versions provide with reduction in b-tree index sizes, as well as examples where little or no benefit will be gained. Similarly, if you are wondering if you should default to INT or BIGINT? Use BIGINT. Or, should we disable the default compression used by TOAST? Probably not! Each one of those posts includes the code and process I used so others can replicate and either confirm or dispute my findings on their hardware.

You can search RustProof Labs for topics like data dictionary, PostGIS h3, or Postgres performance and see dozens of results about these topics, each with relevant code and procedures.

Sharing via projects

Our data dictionary project PgDD (repo) is a great example of the lifecycle seen by truly helpful helper scripts. When I first introduced my Postgres data dictionary in 2018 I had accumulated a set of queries based on the slash commands in psql, like \dt+. Later that same year, those queries were formalized into the initial open sourced version of the PgDD project. This initial version used Sqitch for deployment which made it easy for me to deploy that project into multiple databases internally, while keeping the defined views in sync across my projects. Later in 2019 PgDD became a PostgreSQL extension to make it easier to share the project with teams not using Sqitch. Originally the extension was built using raw SQL, then in 2021 it was rewritten using the pgx framework.

Some of the other helpful projects I have open sourced are listed below. Each of these started as a rough collection of scripts that solidified into their current forms.

PgOSM Flex: High quality OpenStreetMap easily loaded to PostGIS using a Docker image
Convert Postgres extension built with pgx: Collection of functions to convert between various units of measurement
PgConfig (GitHub): Helps track configuration changes between major Postgres versions to ease upgrade planning
SRID Bounding Boxes: Easily find a localized SRID for specific areas (e.g. Denver Colorado) in specific units (meters)

My eBook Mastering PostGIS and OpenStreetMap explains SRIDs and their importance in Chapter 5. Chapter 17 illustrates how to use the SRID Bounding Box project to provide accurate calculations anywhere in the world, without losing the performance benefits and functionality of the GEOMETRY data type.

Community Resources

The next question continues with the theme of helpful scripts from the broader community perspective.

Question: There are many “scripts” to do all sorts of things in PostgreSQL that are stored everywhere and nowhere specifically (like the above mentioned bloat scripts, lists of unused indexes, you name it). Do you think they should be a part of the PostgreSQL official documentation and re-verified for each new major version?

Community resources are an invaluable part of my toolbox. I can't count how many times I have referred someone to ioguix's pgsql-bloat-estimation scripts. The Postgres Wiki's Don't Do This page is another page that I refer others to often, typically when discussion either TEXTor TIMESTAMPTZ data types.

How much should be maintained by the core Postgres community? Only as much as someone is willing to maintain. I don't think the main documentation is the appropriate place for many of the helpful scripts though. For example, it is trivial to automate a git clone to download the bloat queries to whatever machine they need to be on. If those scripts were in a documentation page, that ability goes away.

Summary

I am excited to see all of the other contributions for this month's #PGSQLPhriday topic! There are so many tools and approaches to this topic, I have no doubt that I'll discover and learn something new through this month's round of sharing.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published January 06, 2023
Last Updated January 06, 2023