RustProof Labs: blogging for education (logo)

PostgreSQL performance on Raspberry Pi, Reporting edition

By Ryan Lambert -- Published April 18, 2019

In a previous post I documented PostgreSQL's performance on the Raspberry Pi 3B over multiple versions of Postgres. There I used the standard TPC-B (sort of) tests included with pgbench that try to run as fast as possible. That's reasonable from a certain perspective, but falls short from helping to decide how much hardware you really need.

This post is part of the series PostgreSQL: From Idea to Database.

This post puts Postgres on the Pi under a more realistic workload (for me) and explains how I approached the task. For this post I'm using Postgres v11.2 for all tests and using three Raspberry Pi's for the testing. The configuration for Postgres is the same as the prior post and the only Postgres-specific tuning I do in this post is on max_parallel_workers_per_gather. That brings me to by secondary goal with this post, do a better job finding where PostgreSQL's parallel query feature works well, and where it can hurt performance. Parallel query was introduced in Pg 9.6 and has improved with each subsequent version.

My example workload

My Postgres-Pis are all disposable and easy to rebuild. I have a standard image created that I can flash to an SD card, run an Ansible deployment or two, and the Pi is setup for whatever I want it to do. Much of my querying is from a reporting/analytical perspective. This means I'm using wider tables with multiple data types, indexes, joins, and aggregations. Not tiny SELECT, UPDATE and INSERT queries.

Another difference is my Pi database servers aren't constantly bombarded at these high rates of connection. A lot of databases are idle much of the time (gasp) with occasional use by a handful of users in short bursts. Especially with a Raspberry Pi, I see those as "developer machines." They're cheap enough that each developer can have a rack (seriously, $250 for a rack of 4...) and they perform well enough, along with automation, that makes them seriously useful.

Luckily, pgbench gives us everything we need to change up our testing scenario a different workload in mind.

Setting up for Tests

I created a new GitHub repo to make these tests easy to share and reproduce. Tests for this post used commit 2fe73bf.

Switch to the postgres user and setup for the tests.

sudo su - postgres
mkdir ~/git
cd ~/git
git clone https://github.com/rustprooflabs/pgbench-tests.git
cd pgbench-tests

Custom initialization

Create the bench_test database and initialize it with the reporting tables. We need to use psql to seed the database instead of pgbench. With custom tests, we have custom tables and data loading too. For the Pi I use scale=2 to load the pgbench_reporting table with 200,000 rows.

createdb bench_test
psql -d bench_test -f init/reporting.sql -v scale=2

This creates a bench_test database with three tables. The main table being pgbench_reporting. Using a scale of 2 on this custom data set seeds 200,000 rows taking up a little under 40MB on disk.

psql -d bench_test -c "\dt+"

                           List of relations
┌────────┬───────────────────┬───────┬──────────┬───────┬─────────────┐
│ Schema │       Name        │ Type  │  Owner   │ Size  │ Description │
╞════════╪═══════════════════╪═══════╪══════════╪═══════╪═════════════╡
│ public │ pgbench_lookup1   │ table │ postgres │ 48 kB │             │
│ public │ pgbench_lookup2   │ table │ postgres │ 48 kB │             │
│ public │ pgbench_reporting │ table │ postgres │ 37 MB │             │
└────────┴───────────────────┴───────┴──────────┴───────┴─────────────┘
(3 rows)

Looking closer at the pgbench_reporting table there are a few types of data, foreign key constraints and indexes added to the foreign key columns.

bench_test=# \d pgbench_reporting 
             Table "public.pgbench_reporting"
┌────────┬──────────────┬───────────┬──────────┬─────────┐
│ Column │     Type     │ Collation │ Nullable │ Default │
╞════════╪══════════════╪═══════════╪══════════╪═════════╡
│ id     │ bigint       │           │ not null │         │
│ d1     │ text         │           │          │         │
│ fk1    │ bigint       │           │ not null │         │
│ fk2    │ bigint       │           │ not null │         │
│ v1     │ numeric(8,4) │           │          │         │
│ v2     │ integer      │           │          │         │
│ d2     │ text         │           │          │         │
└────────┴──────────────┴───────────┴──────────┴─────────┘
Indexes:
    "pk_pgbench_reporting" PRIMARY KEY, btree (id)
    "ix_pgbench_reporting_fk1" btree (fk1)
    "ix_pgbench_reporting_fk2" btree (fk2)
Foreign-key constraints:
    "fk_pgbench_reporting_fk1" FOREIGN KEY (fk1) REFERENCES pgbench_lookup1(id)
    "fk_pgbench_reporting_fk2" FOREIGN KEY (fk2) REFERENCES pgbench_lookup2(id)

This structure is intended to simulate common types of tables seen often "in the wild" of production databases.

View for test queries

The initialization script above also created a view named pgbench_view used for the tests. This handles the joins to use the foreign keys defined on the base table, the added columns are aliased as fk1data and fk2data.

\d pgbench_view 
                View "public.pgbench_view"
┌─────────┬──────────────┬───────────┬──────────┬─────────┐
│ Column  │     Type     │ Collation │ Nullable │ Default │
╞═════════╪══════════════╪═══════════╪══════════╪═════════╡
│ id      │ bigint       │           │          │         │
│ d1      │ text         │           │          │         │
│ fk1     │ bigint       │           │          │         │
│ fk1data │ text         │           │          │         │
│ fk2     │ bigint       │           │          │         │
│ fk2data │ text         │           │          │         │
│ v1      │ numeric(8,4) │           │          │         │
│ v2      │ integer      │           │          │         │
│ d2      │ text         │           │          │         │
└─────────┴──────────────┴───────────┴──────────┴─────────┘

Test queries

I used three queries for this testing (one, two, three). While pgbench allows you to weight these queries I decided to keep all weights equal to start with. The first query (tests/reporting1.sql) calculates a few aggregates grouped by fk1 and fk1data.

SELECT fk1, fk1data,
    COUNT(*) AS cnt,
    AVG(v1) AS v1avg,
    MAX(v2) AS v2avg
FROM pgbench_view
GROUP BY fk1, fk1data
;

The plan from EXPLAIN ANALYZE with max_parallel_workers_per_gather = 2 (default in Pg11) shows that Postgres does decide to use Parallel for this query. This indicates our setting here could help or hurt performance.

...
Workers Planned: 2
Workers Launched: 2
...
    Sort Method: quicksort  Memory: 19kB
    Worker 0:  Sort Method: quicksort  Memory: 19kB
    Worker 1:  Sort Method: quicksort  Memory: 19kB
...

The TCP-B (sort of) tests do not benefit/hurt from parallel queries as of Pg11. (At least not on the Pi!)

Testing full speed

First I ran pgbench for 1 hour with 10 clients and my three new queries. This had the result of 5-6 TPS on nearly any configuration. This was frankly rather boring.

pgbench -c 10 -j 2 -T 3600 -P 60 -s 2 \
    -f tests/reporting1.sql \
    -f tests/reporting2.sql \
    -f tests/reporting3.sql \
    bench_test

Customizing pgbench

I want to pull back on the rate of execution to see what is reasonable to expect from Postgres on a Pi for a single-user. I don't expect this user to be hitting F5 (execute query) as fast as machine-possible either. We've already tried that, now lets see what happens when things are made more realistic for this tiny hardware.

To do this, I need to define my quality expectations for performance. Let's say I want to run one query once per second and I expect full results in 500ms or less (e.g. latency, or lat). This definition can now be translated to pgbench using a couple new options. Adding -R 1 -r --latency-limit=500 to the pgbench command should achieve these results. The option -R 1 sets the target rate at 1 TPS and --latency-limit=500 sets the performance threshold at 500ms.

Performance of your database is a quality metric. The trick for planning is to translate between user expectations and our testing methodology.

pgbench -c 1 -j 1 -T 3600 -P 30 -s 2 \
    -R 1 -r --latency-limit=500 \
    -f tests/reporting1.sql \
    -f tests/reporting2.sql \
    -f tests/reporting3.sql \
    bench_test

...
progress: 330.0 s, 0.8 tps, lat 216.944 ms stddev 144.121, lag 15.872 ms, 0 skipped
progress: 360.0 s, 1.1 tps, lat 257.683 ms stddev 155.640, lag 45.778 ms, 1 skipped
progress: 390.0 s, 1.2 tps, lat 208.859 ms stddev 130.061, lag 21.272 ms, 0 skipped
progress: 420.0 s, 1.0 tps, lat 260.450 ms stddev 141.157, lag 45.121 ms, 1 skipped
progress: 450.0 s, 1.2 tps, lat 253.322 ms stddev 171.986, lag 56.560 ms, 0 skipped
...
number of transactions actually processed: 611
number of transactions skipped: 3 (0.489 %)
number of transactions above the 500.0 ms latency limit: 16/611 (2.619 %)
latency average = 237.988 ms
latency stddev = 144.305 ms
rate limit schedule lag: avg 32.037 (max 462.121) ms
...

Note: Setting clients (-c) higher than than TPS rate limits (-R) limits all clients to the defined rate. This probably only matters on tiny hardware!

The pgbench output now provides information about "slow" and "skipped" queries in the summary. In this test case, 0.5% were skipped and 2.6% of the queries took longer than 0.5 seconds to execute.

number of transactions skipped: 3 (0.489 %)
number of transactions above the 500.0 ms latency limit: 16/611 (2.619 %)

Skipped transactions are explained in the docs:

"If --latency-limit is used together with --rate, a transaction can lag behind so much that it is already over the latency limit when the previous transaction ends, because the latency is calculated from the scheduled start time. Such transactions are not sent to the server, but are skipped altogether and counted separately.""

Test Results

Now for the results from the bulk of the testing. All tests were ran for 1 hour (-T 3600) and most of them were ran at least twice on multiple Pis. The "# of Clients" (N) is controlled by setting both clients (-c N) and rate (-R N) via pgbench. For these purposes, this simulates each client attempting to achieve 1 TPS. The "Testing full speed" section above found that these queries on this hardware will max out at 5-6 TPS so I tested below and above those limits (1-8) for number of clients.

I didn't test 3 or 7 clients because I didn't expect them to provide meaningful insight.

The only PostgreSQL-specific parameter controlled for in these tests was max_parallel_workers_per_gather, all other Postgres configurations are as documented in my prior post with nearly all configurations at their defaults.

Late queries

The following chart and table show the percentage (%) of late queries. In other words, how many queries took longer than the target 500ms execution time? With max_parallel_workers_per_gather = 0 these queries didn't perform as hoped more than 1/3 of the time. The good news is, performance was consistent up through 5 clients (read: predictable). Performance really starts to drop off with 6 or more clients; by 8 clients about 2/3 queries were slow.

With max_parallel_workers_per_gather set at either 2 or 4 (the Pi has 4 cores), and 1 or 2 clients, well under 10% of queries were slow. Performance was also noticeably better with this setting at 4 clients than it was with the setting at 0. Once the number of clients exceeds the number of cores, performance starts to drop off for these queries. At 6 clients, edging past the already known TPS limit for the hardware, performance drops off a cliff with max_parallel_workers_per_gather > 0. With 8 clients, nearly 90% of queries perform poorly.

Chart showing % of Late queries based on # of clients and # of PostgreSQL parallel workers

Table showing % of Late queries based on # of clients and # of PostgreSQL parallel workers

Skipped queries

The skipped transactions data shows similar results. Starting again with max_parallel_workers_per_gather = 0, roughly 15% of queries were skipped when the number of clients was 6 or less. With 8 clients this failure rate spiked up to more than 1/3 being skipped. With max_parallel_workers_per_gather set at either 2 or 4, the query skip rate held at 1% or lower as long as the number of clients was lower than the number of cores.

Chart showing % of Skipped queries based on # of clients and # of PostgreSQL parallel workers

Table showing % of Skipped queries based on # of clients and # of PostgreSQL parallel workers

Summary

I had two goals for this post, let's see how they worked out.

First, was to preform more realistic testing of PostgreSQL performance on the Pi for my workloads. I found the Raspberry Pi 3B is perfectly capable of good performance (as defined above) with analytical-ish queries on a database of modest size (200k rows) and a small (<= 4) number of concurrent users. Would I recommend this hardware for heavy-duty mission critical databases? No. (Why would you even consider that?) I don't use VirtualBox virtual machines on my laptop to power client databases either. But I do use them all the time for development, testing, and other non-critical use cases to enable the production systems. The Pi is no different here.

These results confirm my personal experience of using a Pi in this type of scenario.

Second, was to more thoroughly explore the thresholds and limits of parallel query. Last year I did some initial testing of parallel query performance on Pg10 and I wanted closer look at that. The results in this post illustrate that in some use cases parallel query can significantly help performance. That said, you need to be careful with the number of simultaneous clients and the setting of max_parallel_workers_per_query.

Testing and monitoring are your friends. No results I (or anyone) can publish can ever tell you how your data and your workload will perform on Hardware X.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published April 18, 2019
Last Updated April 18, 2019