osm2pgsql on a Raspberry Pi
I love the Raspberry Pi! It's an affordable little box of "I can do it" Linux-goodness.
Though, some tasks are not quite as easy to get right on the Pi, and getting osm2pgsql
to
run reliably on a
Raspberry Pi 3B
has been a challenge.
This post examines one limitation of the Raspberry Pi system, how this can make
osm2pgsql go horribly wrong, and how to configure the system to make it work.
If you are new to osm2pgsql, read my initial post first, I do not repeat the overall process here.
Raspberry Pi = Slow I/O
Update: Most of this complaint is resolved by using application class (A1 or A2) SD cards. The rest is resolved with the introduction of the Raspberry Pi 4.
The main storage of the Raspberry Pi is a micro SD card. While this is great from a cost and availability perspective, the performance of the best SD cards is sub-par for most relational database work. I mention this first, because even when working with more powerful hardware, including faster disks, the main limitation is disk I/O. This is even more of an issue on the Pi. In my previous post I warned:
Postgres and osm2pgsql both use a lot of disk I/O during this process!
If you run osm2pgsql and PostgreSQL/PostGIS all on a Raspberry Pi, even modest settings
like I previously outlined can bring your Pi to its knees. The following
Unable to handle kernel NULL pointer dereference...
error is one of the many critical errors I have managed to encounter in this process.
Don't multi-process
I know, your Raspberry Pi 3B has a quad-core processor. And the osm2pgsql command accepts a
--number-processes
flag. You must avoid the temptation to try to utilize more of the CPU...
do not succumb to the temptation! Your Pi is already I/O bound, you will only cause more
contention for the very limited I/O and receive random failures in the process.
Changes to configuration
Make sure to follow the configuration settings outlined in my initial post.
There are only two changes I make from the recommendations there. The first change is in
postgresql.conf
, the shared_buffers
is set down to 100MB.
The other change is to set --cache
of the osm2pgsql
command down to 100 as well.
Finding the ideal settings
To get to the 100/100 settings above, I went through dozens of tests with quite a few failures on the way. Luckily, I happen to have a Pi Rack that I can put to work for experiments like this.
Time to run
I found timings on the Pi to be extremely variable with wall clock times varying by +/- 30%. Two of the Pis regularly ran on the faster side, while the other two ran on the slower side. If I had infinite time available to me, I would swap out SD cards between Pis and try to show that some of the SD cards are faster than others. But I don't, so for now that's just my best guess for the variance.
Source File | osm.pbf size | Time to Load (Average) |
---|---|---|
District of Columbia | 15.6 MB | 6.01 minutes |
Kansas | 54 MB | 19.7 minutes |
Colorado | 167 MB | 2.76 hours |
Summary
The Raspberry Pi is a great little machine, and I love that PostgreSQL runs so well on such limited hardware. This post illustrates that with some extra care the Pi can run some seriously intensive processes. With these steps in place, I intend to start automating the four Pis in the rack to work through larger sets of OpenStreetMap data processing.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!