My book Mastering PostGIS and OpenStreetMap is available!

osm2pgsql on a Raspberry Pi

By Ryan Lambert -- Published January 29, 2019

I love the Raspberry Pi! It's an affordable little box of "I can do it" Linux-goodness. Though, some tasks are not quite as easy to get right on the Pi, and getting osm2pgsql to run reliably on a Raspberry Pi 3B has been a challenge. This post examines one limitation of the Raspberry Pi system, how this can make osm2pgsql go horribly wrong, and how to configure the system to make it work.

If you are new to osm2pgsql, read my initial post first, I do not repeat the overall process here.

Raspberry Pi = Slow I/O

Update: Most of this complaint is resolved by using application class (A1 or A2) SD cards. The rest is resolved with the introduction of the Raspberry Pi 4.

The main storage of the Raspberry Pi is a micro SD card. While this is great from a cost and availability perspective, the performance of the best SD cards is sub-par for most relational database work. I mention this first, because even when working with more powerful hardware, including faster disks, the main limitation is disk I/O. This is even more of an issue on the Pi. In my previous post I warned:

Postgres and osm2pgsql both use a lot of disk I/O during this process!

If you run osm2pgsql and PostgreSQL/PostGIS all on a Raspberry Pi, even modest settings like I previously outlined can bring your Pi to its knees. The following Unable to handle kernel NULL pointer dereference... error is one of the many critical errors I have managed to encounter in this process.

Image of error messages from Raspberry Pi after running osm2pgsql with too many resources. Unable to handle kernel NULL pointer dereference at virtual address

Don't multi-process

I know, your Raspberry Pi 3B has a quad-core processor. And the osm2pgsql command accepts a --number-processes flag. You must avoid the temptation to try to utilize more of the CPU... do not succumb to the temptation! Your Pi is already I/O bound, you will only cause more contention for the very limited I/O and receive random failures in the process.

Changes to configuration

Make sure to follow the configuration settings outlined in my initial post. There are only two changes I make from the recommendations there. The first change is in postgresql.conf, the shared_buffers is set down to 100MB. The other change is to set --cache of the osm2pgsql command down to 100 as well.

Finding the ideal settings

To get to the 100/100 settings above, I went through dozens of tests with quite a few failures on the way. Luckily, I happen to have a Pi Rack that I can put to work for experiments like this.

View this post on Instagram

A post shared by Track Your Garden (@trackyourgarden) on Dec 20, 2018 at 6:35pm PST

Time to run

I found timings on the Pi to be extremely variable with wall clock times varying by +/- 30%. Two of the Pis regularly ran on the faster side, while the other two ran on the slower side. If I had infinite time available to me, I would swap out SD cards between Pis and try to show that some of the SD cards are faster than others. But I don't, so for now that's just my best guess for the variance.

Source File	osm.pbf size	Time to Load (Average)
District of Columbia	15.6 MB	6.01 minutes
Kansas	54 MB	19.7 minutes
Colorado	167 MB	2.76 hours

Summary

The Raspberry Pi is a great little machine, and I love that PostgreSQL runs so well on such limited hardware. This post illustrates that with some extra care the Pi can run some seriously intensive processes. With these steps in place, I intend to start automating the four Pis in the rack to work through larger sets of OpenStreetMap data processing.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published January 29, 2019
Last Updated September 13, 2019