Backups for Postgres - PGSQL Phriday #002
This blog post is for PGSQL Phriday #002. Read Ryan Booz' introduction from September for more details on PGSQL Phriday. Andreas Scherbaum is this month's host and chose the topic: Postgres backups!
The topic reposted here:
Which tool(s) are you using, where do you store backups, how often do you do backups? Are there any recommendations you can give the reader how to improve their backups? Any lesser known features in your favorite backup tool? Any new and cool features in a recently released version?
Bonus question: Is pg_dump a backup tool?
What is a backup?
I address most of ads' questions in this post, but before we dive in we need to define "backup." Merriam Webster has a definition for backup in the context of computers:
backup (3): "a copy of computer data (such as a file or the contents of a hard drive)"
I'm running with this simple defintion of backup for today's post. To frame the definition of "backup" in a non-Postgres context: Suppose I have a "business document." I want to make some major changes but am afraid of accidentally losing something important. What do I do? I copy / paste the file, change the name to include today's date, and edit away. Did I create a backup of the original document? Sure. In a way. Is it the same thing as when the IT department backs up the network drive the documents where both the original and newly modified files are saved? Nope. Do both approaches serve their purpose? Yes!
Database backups are similar. There isn't a one-size-fits-all solution.
Postgres backups: 1st Line of Defense
At RustProof Labs, our first layer of backups for our Postgres databases is via Postgres' built-in streaming replication to secondary read-only instances. A "backup" you say?
Well, it's a full copy of the data, it can be queried with near-real-time updates, and if the primary server goes down? Our procedures are to promote
the hot standby to primary using
pg_promote()
.
In this scenario we might lose a few seconds worth of data, typically less.
The interruptions to various non-critical services we host that rely on
Postgres could last longer,
minutes or even possibly into hours depending on the exact scenario.
Those interruptions are
planned for via SLAs.
After services are restored the next priority is to replace the missing secondary node. Start to finish, while brewing coffee, I can do that in under 30 minutes, mostly waiting on Ansible to run. Unless we've had cascading failures, this step is zero stress.
The streaming replica is always in a separate data center from the primary instance. This provides protection against localized hardware failures, network failures, and natural disaster impacts that affect a single node and/or a single location. Between hurricanes, fires, and blizzards... it doesn't really matter where you host, you may need to vacate your primary location in a hurry.
Streaming replication provides a specific type of backup. It does not, however, protect from a user running DROP SCHEMA most_important_data CASCADE;
while accidentally on prod. That's where our main backup solution comes in.
Postgres backups: Main Solution
Our main backup solution is pgBackRest. With pgBackRest we have a proper backup solution with full, differential and incremental options. It provides compression, encryption, and retention rules to auto-expire old backups defined by the DBA. Between specific backups and the Point in Time Recovery (PITR) capabilities, using pgBackRest enables recreating entire instances or recovering deleted or unintentionally altered data.
One additional reason I recommend pgBackRest is because their user guides are fantastic! Their default user guide is for Debian / Ubuntu systems and provides an easy to follow introduction and a reliable resource for existing users.
Our pgBackRest instance is dedicated to only that one task. It takes
backups and collects WAL. The pgBackRest
node is in the same data center as at least one Postgres instance. Which
instance it is close to (primary / secondary) isn't terribly important.
The instance is configured with cron
to run a backups every 12 hours. It runs
pgBackRest backup weekly and a mix of differential and
incremental backups a couple times daily. The example output from
the backrest info
command shows the Postgres instance it is backing up is
14.7 GB while the compressed backup totals in at 4.4 GB.
pgbackrest --stanza my_cool_db info
full backup: 20221030-023008F
timestamp start/stop: 2022-10-30 02:30:08 / 2022-10-30 03:22:29
wal start/stop: 000000010000001600000075 / 000000010000001600000075
database size: 14.7GB, database backup size: 14.7GB
repo1: backup set size: 4.4GB, backup size: 4.4GB
incr backup: 20221030-023008F_20221030-143006I
timestamp start/stop: 2022-10-30 14:30:06 / 2022-10-30 14:32:17
wal start/stop: 000000010000001600000077 / 000000010000001600000077
database size: 14.7GB, database backup size: 3.1GB
repo1: backup set size: 4.4GB, backup size: 350.2MB
backup reference list: 20221030-023008F
diff backup: 20221030-023008F_20221031-023006D
timestamp start/stop: 2022-10-31 02:30:06 / 2022-10-31 02:32:16
wal start/stop: 000000010000001600000079 / 000000010000001600000079
database size: 14.7GB, database backup size: 3.1GB
repo1: backup set size: 4.4GB, backup size: 350.5MB
backup reference list: 20221030-023008F
Bonus Question: Is pg_dump
a backup tool?
Is pg_dump
a backup tool? Using the definition above:
Yes, pg_dump
provides a backup.
Now on to what I think is the more important question:
Should you use pg_dump
for your primary backup solution? I wouldn't!
I use pg_dump
to extract processed data from the
PgOSM Flex Docker
containers. I also use pg_dump
occasionally to load current production
data to a test instance for some urgent testing need.
If it isn't urgent, I'll typically stand up either a new replica instead
of using pg_dump
.
If you are using pg_dump
for any reason, make sure you document which options
you use and how to restore the database into a different instance.
The format you use (plain, custom, directory) determines how you can restore
the database, either psql
or pg_restore
.
Recommendation
The key with any backup solution is understanding how your backup and restore solution works. Have recent and tested procedures that describe how to restore your backup to production environments. A backup you can't restore is just 0s and 1s taking up space.
Simulate bad things happening and practice your response. I've found the best way to practice is to regularly use your backup tool(s) to bootstrap your development and QA Postgres instances. Do this when there is no timing pressure for failure. Getting it wrong under zero pressure is how you learn and adjust your documentation. Repeatedly getting it right during practice is how you reduce stress from emergencies.
Practice, practice, practice! When it matters, you can rely on experience and muscle memory. Procedures and testing are critical.
Summary
Thank you, Ads, for hosting and choosing a fun topic! I look forward to seeing the other posts on backups from the community.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!