PostgreSQL performance on Raspberry Pi
Have you ever wondered how well PostgreSQL performs on a
Raspberry Pi 3B?
Oh. You haven't? Oh well, I asked myself that question
and put it to the test. This post covers how I used pgbench
to answer this question and the results of what I found.
I love the Raspberry Pi, my post Low Power Computing with Raspberry Pi explains more.
This post is part of the series PostgreSQL: From Idea to Database.
TLDR;
Modern PostgreSQL (>=9.6), out-of-the-box, achieves around 200 TPS
(TPC-B (sort of)
) on a Raspberry Pi 3B.
That's 17+ million transactions per day!
Note: These tests used non-app rated SD cards. See my post on SD cards for more details.
What was tested
I tested four PostgreSQL versions, 9.6.12, 10.7, 11.2, and 12-dev. Tests
were ran using four Raspberry Pi 3Bs mounted as a Rack-o-Pi
. The first
round of testing used one version of Postgres installed on each
Pi. For the rest of the testing I focused on Pg11 and Pg12.
These little machines put in more than 14 hours of digital blood, sweat and tears to produce this post!
All versions of Postgres tested were built from source.
pgBench
ran locally on each machine.
Postgres Configuration
I kept most of the default configurations for each version, though there are four (4) settings I controlled for this testing.
max_parallel_workers_per_gather
wal_level
checkpoint_timeout
checkpoint_completion_target
All other settings are whatever the default was for that setting in the tested version.
The defaults for checkpoint_timeout
and checkpoint_completion_target
(5min
and 0.5
) don't work well for this case,
I've adjusted those to 30min
and 0.8
for all tests. I didn't
capture the results from those initial tests with the default settings,
but the difference was noticeable.
wal_level
deserves a mention; the default in Pg9.6 was minimal
and in Pg10 and later the default is replica
. This change requires a lot of
extra WAL activity that seriously hinders performance on the Pi.
All tests here used wal_level = minimal
. (Again, initial tests with replica were noticably slower!)
The only PostgreSQL configuration that changed throughout these tests was
max_parallel_workers_per_gather
, set to 2 by default in Pg10 and newer.
Parallel queries was introduced in Pg9
.6 but was that value was 0 by default.
Read my post on parallel query in Pg10 to learn more.
Initialize pgBench
Each test was ran with a fresh database initialized with a scale of 10 using these commands.
/usr/local/pgsql/bin/createdb bench_test
/usr/local/pgsql/bin/pgbench -i -s 10 bench_test
This creates a bench_test
database with the
pgbench_accounts
table storing 1M rows, taking up 133 MB for that table.
Row count:
postgres@tag103:~$ /usr/local/pgsql/bin/psql -d bench_test -c "SELECT COUNT(*) FROM pgbench_accounts;"
count
---------
1000000
(1 row)
Show tables with sizes:
postgres@tag103:~$ /usr/local/pgsql/bin/psql -d bench_test -c "\dt+"
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+----------+--------+-------------
public | pgbench_accounts | table | postgres | 133 MB |
public | pgbench_branches | table | postgres | 96 kB |
public | pgbench_history | table | postgres | 41 MB |
public | pgbench_tellers | table | postgres | 392 kB |
Test ran
The following command is the pgbench
command used.
The only change was if the -T
parameter was adjusted between
3600 (1 hour) or 600 (10 minutes).
pgbench -c 10 -j 2 -T 3600 -P 60 bench_test
starting vacuum...end.
...
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
duration: 3600 s
number of transactions actually processed: 772292
latency average = 46.614 ms
latency stddev = 875.019 ms
tps = 214.522850 (including connections establishing)
tps = 214.523484 (excluding connections establishing)
Let's take a closer look at these two lines of the output at the end.
duration: 3600 s
number of transactions actually processed: 772292
More than 750k transactions per hour!
I think it's cool that a Raspberry Pi 3B with it's 5V power supply, SD card, ARM processor (1.2GHz) and 1GB of RAM handled 750k transactions in an hour. Then again, I am a bit biased.
Immediate Results
Within the first two runs of pgbench
it was obvious that one of my
Pi's was not keeping pace with the others. I ran it through the full
range of tests and found that particular hardware consistently ran 20%
slower than the median of the other three nodes. I had had issues
with that particular Pi in the past and my suspicion was the SD card is
going bad, this could be pointing to the same thing...
Over-baked Pi?
The results from the slow node have been removed from all aggregate results presented in the remainder of this post.
Takeaways
On the Raspberry Pi, you can expect roughly the same performance across versions 9.6 through 11.
Pg Version | Min TPS | Median TPS | Max TPS | Std Dev | # Tests |
---|---|---|---|---|---|
9.6.12 | 162 | 232 | 239 | 36.3 | 4 |
10.7 | 174 | 232 | 253 | 37.7 | 4 |
11.2 | 182 | 218 | 254 | 20.1 | 10 |
12-devel | 204 | 222 | 246 | 16 | 9 |
PostgreSQL 11 had less variance in its performance as seen in the standard deviation of TPS observed. Preliminary tests of the Pg12 development branch show further improvements in performance, but the real test is how it works when it goes live!
One more note one the above results is that versions 11 and 12 had
more than one test run at 1 hour, while versions 9.6 and 10 only received
a single one-hour test.
The difference between running this pgbench
test for 10 minutes versus 60
minutes caused a 10% drop in average TPS.
Parallel on Pi?
Starting with Postgres 10, the default is to give 2 cores to
parallel processing. Switching
max_parallel_workers_per_gather
between 2 and 0 had nearly
zero impact on that Pi, either for or against.
When working with mobile-scale technology, there are plenty of
other bottlenecks that parallel just won't help with. Also,
the queries used for the TPC-B (sort of)
tests are not the typical
workload that parallel query is expected to benefit.
Pg Version | # Parallel Workers | Min TPS | Median TPS | Max TPS | Std Dev | # Tests |
---|---|---|---|---|---|---|
11.2 | 0 | 205 | 220 | 254 | 19.4 | 6 |
11.2 | 2 | 182 | 218 | 233 | 21.8 | 4 |
12-devel | 0 | 205 | 222 | 246 | 16.5 | 5 |
12-devel | 2 | 204 | 218 | 244 | 18.1 | 4 |
Reporting style queries with aggregations are better suited for parallel query. See my follow-up post for those results.
Incremental reporting with pgbench
The pgbench
command I used includes -P 60
. This tells pgbench
to print out progress updates every 60 seconds. I highly recommend
you do this for longer tests, you'll probably see some details that
are otherwise obscured by aggregated details. Notice the severe
change in TPS and related metrics around 10 minutes (600 seconds)
into this test run.
pgbench -c 10 -j 2 -T 3600 -P 60 bench_test
starting vacuum...end.
...
progress: 540.0 s, 171.8 tps, lat 55.105 ms stddev 946.851
progress: 600.0 s, 24.6 tps, lat 435.693 ms stddev 2945.727
progress: 660.0 s, 405.8 tps, lat 24.108 ms stddev 134.692
...
Notice that across three minutes it went from 171 TPS, down to 25 TPS, and then back up to 400 TPS? That's a very big swing in performance...
Pi Hardware Considerations
These tests all use standard Pi hardware including the often-loathed micro-SD card. Ultimately, I made the decision to keep things as standard as possible in order to keep things generic. The SD cards used in these tests have all been in 24/7 use for at least six (6) months with occasional abuse as required for content like this, and how to load OSM data on the Pi. The slow-node mentioned earlier happens to live on the Pi that I've beat-up the most with OpenStreetMap and other large spatial data sets. I've repeated loaded multi-GB databases to that poor 16GB card... not surprised it is wearing out the SD card!
That said, I did perform the same basic tests with a few USB drives I had laying around. Putting the entire Postgres data directory (including logs, WAL, etc) on a USB drive seemed to drop performance considerably. That was unexpected, and bad. By putting either the data files or the logs/WAL on the USB (leaving the other portion on the SD card) seemed to get a moderate performance boost, though far from a major boost like I had hoped for. I suspect this is due to the USB bus sharing with Ethernet as mentioned in this article:
"... still shares a single USB channel to the SoC, as in previous designs..."
Related notes
The only other testing I could find of PostgreSQL on the Raspberry Pi was this post from 2013 on older hardware and PostgreSQL 9.2. The results for the "small" database in that post (slightly smaller than tested here) showed 14 TPS for the TPC-B test. Compared to >200 TPS that I found here!
Limitations
There are plenty of limitations when working with this hardware. This post has a good explanation of the shared bus issues, thermal limiting, and so forth.
The sort-of TCP-B tests are also a limitation. This test workload
looks nothing like the queries I actually run on a Raspberry Pi.
The queries used in the default pgbench
tests:
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
Most of what I use a Pi for is reporting type work including joins, aggregations, and such. Even using the read-only option with the default tests does not provide a real test for this type of workload.
Read part 2 (the reporting edition) for a look at testing a reporting-style PostgreSQL workload.
Summary
PostgreSQL truly is an amazing database platform to be able to handle such limited resources so eloquently. The Raspberry Pi is quite a cool little piece of hardware, you just have to understand what you're working with.
This little Rack-o-Pi
could easily be setup with a single primary node
to handle the writes, streaming replication the other three nodes
serving read-only queries and you'd be able to handle quite a respectable
load (assuming more read than write).
My next round of pgbench
testing with PostgreSQL will be on
more powerful systems. Stay tuned!
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!