Postgres and Software Development: PGSQL Phriday #004
This blog post is part of PGSQL Phriday #004. Hettie Dombrovskaya is this month's host! I was very excited to see the topic chosen as Postgres and Software Development. My contribution to this month's #pgsqlphriday topic covers how I manage code through our development processes. Check out Hettie's post for more about this month's topic.
Types of code
Before continuing on to the specific questions for the challenge, I want to define the broad groupings of code I work.
- Mission critical
- Not trivial
Mission critical code is where most of the "special sauce" is at. Mission
critical SQL code includes DDL commands to create the database structure
CREATE TABLE foo and
CREATE VIEW baz. This level of code
represents the data structure that enables everything else to function.
Trivial SQL code is at the other end of the spectrum compared to the mission critical code. Trivial SQL code are queries anyone familiar with the database could craft in 5 minutes or less. Simple selects, filtering and aggregates come to mind here. The "not trivial" code is everything between trivial and mission critical. Queries with complex filters, window functions, or multiple steps would classify as not trivial.
With these classifications for our code in place we can go back to the questions!
SQL, source control, and testing
Question: Do you store your SQL code in GitHub (or other source control, e.g. GitLab)?
Yes! All mission critical SQL I am involved with is saved in a Git repo, either public or private depending on the project. The use of source control for mission critical code (SQL, Python, etc.) is non-negotiable. A good portion of my "not trivial" code is also stored in source control, with my trend leaning towards more code in source control. The trivial code isn't worth the effort of putting it into source control or the bloat it creates in those projects.
Question: Do you use pgTAP? Do you think it’s a good practice to have pgTAP tests in your repo, or does it not add value?
I have not used pgTAP and know very little about it, other than it exists. That isn't to say pgTAP isn't helpful, just that it covers functionality that I have covered through other tools. If you generalize the question to "Do you think it's a good practice to have tests in your repo?" my answer is a definite Yes! The tool isn't important, the culture of testing your code is.
I use Sqitch to deploy mission critical code through migrations. Sqitch provides the option to use verify scripts that can handle many of the types of testing pgTAP would provide for me. For testing specific database functionality with data involved, I approach that through the applications themselves. My approach has been to make it trivial to setup test databases with known data and validate database functionality from the app. This approach means for application testing I'm not wasting time mocking database connections and objects. I just... use the database. This ensure functionality work as expected through the entire stack.
Helper scripts and giving back
Two of Hettie's questions relate to helper scripts and giving back.
- Do you have any of your own scripts which make your life as a DBA/Database developer/consultant easier?
- What do they do? Do you share them with others, or are they hidden in your secret toolbox?
Yes, of course I have a collection of scripts. The vast majority of the helpful code I've written are available to the public for free in one form or another. The longest standing form I've used to share is the RustProof Labs blog. Another way I have shared helpful code back to the community is through open source projects. These two avenues for sharing are explored next.
Sharing via blog posts
I started our blog in July 2013 🤯 and am approaching 200 published posts; this post will be #170! A good portion of the posts I have published over the years are basically collections of scripts with detailed explanations between the queries.
Want to know how I created the following animated image using PostGIS, trajectories, and QGIS? If so, PostGIS Trajectory: Space plus Time is the post for you! That blog post is an example of some "not trivial" code.
A big part of what I do as a consultant is make a recommendations.
I like to back those up with something more concrete than a conversation
and a hunch.
For example, when I recommend to upgrade an instance to at
least Postgres 13 to reduce the databases' B-Tree index size,
I reference the TLDR section here.
That post shows how much benefit later Postgres versions provide with
reduction in b-tree index sizes, as well as examples where little or
no benefit will be gained.
Similarly, if you are wondering if you should default to
Or, should we disable the default compression used by TOAST?
Each one of those posts includes the code and process I used so others
can replicate and either confirm or dispute my findings on their hardware.
Sharing via projects
Our data dictionary project PgDD (repo)
is a great example of the lifecycle seen by truly helpful helper scripts.
When I first introduced my Postgres data dictionary
in 2018 I had accumulated a set of queries based on the slash commands
Later that same year, those queries were formalized into the
initial open sourced version of the PgDD project.
This initial version used Sqitch
for deployment which made it easy for me to deploy that project
into multiple databases internally, while keeping the defined
views in sync across my projects.
Later in 2019 PgDD became a PostgreSQL extension
to make it easier to share the project with teams not using Sqitch.
Originally the extension was built using raw SQL, then in 2021
it was rewritten using the pgx framework.
Some of the other helpful projects I have open sourced are listed below. Each of these started as a rough collection of scripts that solidified into their current forms.
- PgOSM Flex: High quality OpenStreetMap easily loaded to PostGIS using a Docker image
- Convert Postgres extension built with pgx: Collection of functions to convert between various units of measurement
- PgConfig (GitHub): Helps track configuration changes between major Postgres versions to ease upgrade planning
- SRID Bounding Boxes: Easily find a localized SRID for specific areas (e.g. Denver Colorado) in specific units (meters)
My eBook Mastering PostGIS and OpenStreetMap explains SRIDs and their importance in Chapter 5. Chapter 17 illustrates how to use the SRID Bounding Box project to provide accurate calculations anywhere in the world, without losing the performance benefits and functionality of the
The next question continues with the theme of helpful scripts from the broader community perspective.
Question: There are many “scripts” to do all sorts of things in PostgreSQL that are stored everywhere and nowhere specifically (like the above mentioned bloat scripts, lists of unused indexes, you name it). Do you think they should be a part of the PostgreSQL official documentation and re-verified for each new major version?
Community resources are an invaluable part of my toolbox. I can't count how many
times I have referred someone to
scripts. The Postgres Wiki's Don't Do This
page is another page that I refer others to often, typically when discussion
TIMESTAMPTZ data types.
How much should be maintained by the core Postgres community? Only
as much as someone is willing to maintain. I don't think the main
documentation is the appropriate place for many of the helpful scripts
though. For example, it is trivial to automate a
git clone to
download the bloat queries to whatever machine they need to be on.
If those scripts were in a documentation page, that ability goes away.
I am excited to see all of the other contributions for this month's
#PGSQLPhriday topic! There are so many tools and approaches to this
topic, I have no doubt that I'll discover and learn something new
through this month's round of sharing.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!
Published January 06, 2023
Last Updated January 06, 2023