RustProof Labs: blogging for education (logo)
My book Mastering PostGIS and OpenStreetMap is available!

Backups for Postgres - PGSQL Phriday #002

By Ryan Lambert -- Published November 04, 2022

This blog post is for PGSQL Phriday #002. Read Ryan Booz' introduction from September for more details on PGSQL Phriday. Andreas Scherbaum is this month's host and chose the topic: Postgres backups!

The topic reposted here:


Which tool(s) are you using, where do you store backups, how often do you do backups? Are there any recommendations you can give the reader how to improve their backups? Any lesser known features in your favorite backup tool? Any new and cool features in a recently released version?

Bonus question: Is pg_dump a backup tool?


What is a backup?

I address most of ads' questions in this post, but before we dive in we need to define "backup." Merriam Webster has a definition for backup in the context of computers:

backup (3): "a copy of computer data (such as a file or the contents of a hard drive)"

I'm running with this simple defintion of backup for today's post. To frame the definition of "backup" in a non-Postgres context: Suppose I have a "business document." I want to make some major changes but am afraid of accidentally losing something important. What do I do? I copy / paste the file, change the name to include today's date, and edit away. Did I create a backup of the original document? Sure. In a way. Is it the same thing as when the IT department backs up the network drive the documents where both the original and newly modified files are saved? Nope. Do both approaches serve their purpose? Yes!

Database backups are similar. There isn't a one-size-fits-all solution.

Postgres backups: 1st Line of Defense

At RustProof Labs, our first layer of backups for our Postgres databases is via Postgres' built-in streaming replication to secondary read-only instances. A "backup" you say? Well, it's a full copy of the data, it can be queried with near-real-time updates, and if the primary server goes down? Our procedures are to promote the hot standby to primary using pg_promote(). In this scenario we might lose a few seconds worth of data, typically less. The interruptions to various non-critical services we host that rely on Postgres could last longer, minutes or even possibly into hours depending on the exact scenario. Those interruptions are planned for via SLAs.

After services are restored the next priority is to replace the missing secondary node. Start to finish, while brewing coffee, I can do that in under 30 minutes, mostly waiting on Ansible to run. Unless we've had cascading failures, this step is zero stress.

The streaming replica is always in a separate data center from the primary instance. This provides protection against localized hardware failures, network failures, and natural disaster impacts that affect a single node and/or a single location. Between hurricanes, fires, and blizzards... it doesn't really matter where you host, you may need to vacate your primary location in a hurry.

Streaming replication provides a specific type of backup. It does not, however, protect from a user running DROP SCHEMA most_important_data CASCADE; while accidentally on prod. That's where our main backup solution comes in.

Postgres backups: Main Solution

Our main backup solution is pgBackRest. With pgBackRest we have a proper backup solution with full, differential and incremental options. It provides compression, encryption, and retention rules to auto-expire old backups defined by the DBA. Between specific backups and the Point in Time Recovery (PITR) capabilities, using pgBackRest enables recreating entire instances or recovering deleted or unintentionally altered data.

One additional reason I recommend pgBackRest is because their user guides are fantastic! Their default user guide is for Debian / Ubuntu systems and provides an easy to follow introduction and a reliable resource for existing users.

Our pgBackRest instance is dedicated to only that one task. It takes backups and collects WAL. The pgBackRest node is in the same data center as at least one Postgres instance. Which instance it is close to (primary / secondary) isn't terribly important. The instance is configured with cron to run a backups every 12 hours. It runs pgBackRest backup weekly and a mix of differential and incremental backups a couple times daily. The example output from the backrest info command shows the Postgres instance it is backing up is 14.7 GB while the compressed backup totals in at 4.4 GB.

pgbackrest --stanza my_cool_db info

   full backup: 20221030-023008F
        timestamp start/stop: 2022-10-30 02:30:08 / 2022-10-30 03:22:29
        wal start/stop: 000000010000001600000075 / 000000010000001600000075
        database size: 14.7GB, database backup size: 14.7GB
        repo1: backup set size: 4.4GB, backup size: 4.4GB

    incr backup: 20221030-023008F_20221030-143006I
        timestamp start/stop: 2022-10-30 14:30:06 / 2022-10-30 14:32:17
        wal start/stop: 000000010000001600000077 / 000000010000001600000077
        database size: 14.7GB, database backup size: 3.1GB
        repo1: backup set size: 4.4GB, backup size: 350.2MB
        backup reference list: 20221030-023008F

    diff backup: 20221030-023008F_20221031-023006D
        timestamp start/stop: 2022-10-31 02:30:06 / 2022-10-31 02:32:16
        wal start/stop: 000000010000001600000079 / 000000010000001600000079
        database size: 14.7GB, database backup size: 3.1GB
        repo1: backup set size: 4.4GB, backup size: 350.5MB
        backup reference list: 20221030-023008F

Bonus Question: Is pg_dump a backup tool?

Is pg_dump a backup tool? Using the definition above: Yes, pg_dump provides a backup.

Now on to what I think is the more important question: Should you use pg_dump for your primary backup solution? I wouldn't! I use pg_dump to extract processed data from the PgOSM Flex Docker containers. I also use pg_dump occasionally to load current production data to a test instance for some urgent testing need. If it isn't urgent, I'll typically stand up either a new replica instead of using pg_dump.

If you are using pg_dump for any reason, make sure you document which options you use and how to restore the database into a different instance. The format you use (plain, custom, directory) determines how you can restore the database, either psql or pg_restore.

Recommendation

The key with any backup solution is understanding how your backup and restore solution works. Have recent and tested procedures that describe how to restore your backup to production environments. A backup you can't restore is just 0s and 1s taking up space.

Simulate bad things happening and practice your response. I've found the best way to practice is to regularly use your backup tool(s) to bootstrap your development and QA Postgres instances. Do this when there is no timing pressure for failure. Getting it wrong under zero pressure is how you learn and adjust your documentation. Repeatedly getting it right during practice is how you reduce stress from emergencies.

Practice, practice, practice! When it matters, you can rely on experience and muscle memory. Procedures and testing are critical.

Summary

Thank you, Ads, for hosting and choosing a fun topic! I look forward to seeing the other posts on backups from the community.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published November 04, 2022
Last Updated November 04, 2022