RustProof Labs: blogging for education (logo)

PostgreSQL performance on Raspberry Pi

By Ryan Lambert -- Published April 03, 2019

Have you ever wondered how well PostgreSQL performs on a Raspberry Pi 3B? Oh. You haven't? Oh well, I asked myself that question and put it to the test. This post covers how I used pgbench to answer this question and the results of what I found.

I love the Raspberry Pi, my post Low Power Computing with Raspberry Pi explains more.

This post is part of the series PostgreSQL: From Idea to Database.

TLDR;

Modern PostgreSQL (>=9.6), out-of-the-box, achieves around 200 TPS (TPC-B (sort of)) on a Raspberry Pi 3B. That's 17+ million transactions per day!

What was tested

I tested four PostgreSQL versions, 9.6.12, 10.7, 11.2, and 12-dev. Tests were ran using four Raspberry Pi 3Bs mounted as a Rack-o-Pi. The first round of testing used one version of Postgres installed on each Pi. For the rest of the testing I focused on Pg11 and Pg12.

These little machines put in more than 14 hours of digital blood, sweat and tears to produce this post!

View this post on Instagram

The #raspberrypi looks good when rack mounted! They also do quite a bit of work too! #piws #postgresql #openstreetmap #trackyourgarden

A post shared by Track Your Garden (@trackyourgarden) on

All versions of Postgres tested were built from source. pgBench ran locally on each machine.

Postgres Configuration

I kept most of the default configurations for each version, though there are four (4) settings I controlled for this testing.

All other settings are whatever the default was for that setting in the tested version.

The defaults for checkpoint_timeout and checkpoint_completion_target (5min and 0.5) don't work well for this case, I've adjusted those to 30min and 0.8 for all tests. I didn't capture the results from those initial tests with the default settings, but the difference was noticeable.

wal_level deserves a mention; the default in Pg9.6 was minimal and in Pg10 and later the default is replica. This change requires a lot of extra WAL activity that seriously hinders performance on the Pi. All tests here used wal_level = minimal. (Again, initial tests with replica were noticably slower!)

The only PostgreSQL configuration that changed throughout these tests was max_parallel_workers_per_gather, set to 2 by default in Pg10 and newer. Parallel queries was introduced in Pg9 .6 but was that value was 0 by default.

Read my post on parallel query in Pg10 to learn more.

Initialize pgBench

Each test was ran with a fresh database initialized with a scale of 10 using these commands.

/usr/local/pgsql/bin/createdb bench_test
/usr/local/pgsql/bin/pgbench -i -s 10 bench_test

This creates a bench_test database with the pgbench_accounts table storing 1M rows, taking up 133 MB for that table.

Row count:

postgres@tag103:~$ /usr/local/pgsql/bin/psql -d bench_test -c "SELECT COUNT(*) FROM pgbench_accounts;"
  count  
---------
 1000000
(1 row)

Show tables with sizes:

postgres@tag103:~$ /usr/local/pgsql/bin/psql -d bench_test -c "\dt+"
                          List of relations
 Schema |       Name       | Type  |  Owner   |  Size  | Description 
--------+------------------+-------+----------+--------+-------------
 public | pgbench_accounts | table | postgres | 133 MB | 
 public | pgbench_branches | table | postgres | 96 kB  | 
 public | pgbench_history  | table | postgres | 41 MB  | 
 public | pgbench_tellers  | table | postgres | 392 kB |

Test ran

The following command is the pgbench command used. The only change was if the -T parameter was adjusted between 3600 (1 hour) or 600 (10 minutes).

pgbench -c 10 -j 2 -T 3600 -P 60 bench_test 
starting vacuum...end.

...

transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 2
duration: 3600 s
number of transactions actually processed: 772292
latency average = 46.614 ms
latency stddev = 875.019 ms
tps = 214.522850 (including connections establishing)
tps = 214.523484 (excluding connections establishing)

Let's take a closer look at these two lines of the output at the end.

duration: 3600 s
number of transactions actually processed: 772292

More than 750k transactions per hour!

I think it's cool that a Raspberry Pi 3B with it's 5V power supply, SD card, ARM processor (1.2GHz) and 1GB of RAM handled 750k transactions in an hour. Then again, I am a bit biased.

Immediate Results

Within the first two runs of pgbench it was obvious that one of my Pi's was not keeping pace with the others. I ran it through the full range of tests and found that particular hardware consistently ran 20% slower than the median of the other three nodes. I had had issues with that particular Pi in the past and my suspicion was the SD card is going bad, this could be pointing to the same thing...

Over-baked Pi?

The results from the slow node have been removed from all aggregate results presented in the remainder of this post.

Takeaways

On the Raspberry Pi, you can expect roughly the same performance across versions 9.6 through 11.

Pg Version Min TPS Median TPS Max TPS Std Dev # Tests
9.6.12 162 232 239 36.3 4
10.7 174 232 253 37.7 4
11.2 182 218 254 20.1 10
12-devel 204 222 246 16 9

PostgreSQL 11 had less variance in its performance as seen in the standard deviation of TPS observed. Preliminary tests of the Pg12 development branch show further improvements in performance, but the real test is how it works when it goes live!

One more note one the above results is that versions 11 and 12 had more than one test run at 1 hour, while versions 9.6 and 10 only received a single one-hour test. The difference between running this pgbench test for 10 minutes versus 60 minutes caused a 10% drop in average TPS.

Parallel on Pi?

Starting with Postgres 10, the default is to give 2 cores to parallel processing. Switching max_parallel_workers_per_gather between 2 and 0 had nearly zero impact on that Pi, either for or against. When working with mobile-scale technology, there are plenty of other bottlenecks that parallel just won't help with. Also, the queries used for the TPC-B (sort of) tests are not the typical workload that parallel query is expected to benefit.

Pg Version # Parallel Workers Min TPS Median TPS Max TPS Std Dev # Tests
11.2 0 205 220 254 19.4 6
11.2 2 182 218 233 21.8 4
12-devel 0 205 222 246 16.5 5
12-devel 2 204 218 244 18.1 4

Reporting style queries with aggregations are better suited for parallel query. See my follow-up post for those results.

Incremental reporting with pgbench

The pgbench command I used includes -P 60. This tells pgbench to print out progress updates every 60 seconds. I highly recommend you do this for longer tests, you'll probably see some details that are otherwise obscured by aggregated details. Notice the severe change in TPS and related metrics around 10 minutes (600 seconds) into this test run.

pgbench -c 10 -j 2 -T 3600 -P 60 bench_test 
starting vacuum...end.

...

progress: 540.0 s, 171.8 tps, lat 55.105 ms stddev 946.851
progress: 600.0 s, 24.6 tps, lat 435.693 ms stddev 2945.727
progress: 660.0 s, 405.8 tps, lat 24.108 ms stddev 134.692

...

Notice that across three minutes it went from 171 TPS, down to 25 TPS, and then back up to 400 TPS? That's a very big swing in performance...

Pi Hardware Considerations

These tests all use standard Pi hardware including the often-loathed micro-SD card. Ultimately, I made the decision to keep things as standard as possible in order to keep things generic. The SD cards used in these tests have all been in 24/7 use for at least six (6) months with occasional abuse as required for content like this, and how to load OSM data on the Pi. The slow-node mentioned earlier happens to live on the Pi that I've beat-up the most with OpenStreetMap and other large spatial data sets. I've repeated loaded multi-GB databases to that poor 16GB card... not surprised it is wearing out the SD card!

That said, I did perform the same basic tests with a few USB drives I had laying around. Putting the entire Postgres data directory (including logs, WAL, etc) on a USB drive seemed to drop performance considerably. That was unexpected, and bad. By putting either the data files or the logs/WAL on the USB (leaving the other portion on the SD card) seemed to get a moderate performance boost, though far from a major boost like I had hoped for. I suspect this is due to the USB bus sharing with Ethernet as mentioned in this article:

"... still shares a single USB channel to the SoC, as in previous designs..."

Related notes

The only other testing I could find of PostgreSQL on the Raspberry Pi was this post from 2013 on older hardware and PostgreSQL 9.2. The results for the "small" database in that post (slightly smaller than tested here) showed 14 TPS for the TPC-B test. Compared to >200 TPS that I found here!

Limitations

There are plenty of limitations when working with this hardware. This post has a good explanation of the shared bus issues, thermal limiting, and so forth.

The sort-of TCP-B tests are also a limitation. This test workload looks nothing like the queries I actually run on a Raspberry Pi. The queries used in the default pgbench tests:

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

Most of what I use a Pi for is reporting type work including joins, aggregations, and such. Even using the read-only option with the default tests does not provide a real test for this type of workload.

Read part 2 (the reporting edition) for a look at testing a reporting-style PostgreSQL workload.

Summary

PostgreSQL truly is an amazing database platform to be able to handle such limited resources so eloquently. The Raspberry Pi is quite a cool little piece of hardware, you just have to understand what you're working with.

This little Rack-o-Pi could easily be setup with a single primary node to handle the writes, streaming replication the other three nodes serving read-only queries and you'd be able to handle quite a respectable load (assuming more read than write).

My next round of pgbench testing with PostgreSQL will be on more powerful systems. Stay tuned!

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published April 03, 2019
Last Updated April 18, 2019