RustProof Labs: blogging for education (logo)

Postgres Permissions and Materialized Views

By Ryan Lambert -- Published July 05, 2021

Materialized views in Postgres are a handy way to persist the result of a query to disk. This is helpful when the underlying query is expensive and slow yet high performance SELECT queries are required. When using materialized views they need to be explicitly refreshed to show changes to the underlying table. This is done through the REFRESH MATERIALIZED VIEW <name>; syntax.

Keeping materialized views regularly refreshed is often a delegated to a cron job on a schedule. There is also often a need for database users to manually refresh the data on demand. At this point many users stub their toe on permissions because refreshing a materialized view can only be done by the owner of the materialized view. This post uses a simple example to illustrate how to delegate refresh permissions to other Postgres roles.

Continue Reading

Use BIGINT in Postgres

By Ryan Lambert -- Published June 05, 2021

This post examines a common database design decision involving the choice of using BIGINT versus INT data types. You may already know that the BIGINT data type uses twice the storage on disk (8 bytes per value) compared to the INT data type (4 bytes per value). Knowing this, a common decision is to use INT wherever possible, only resorting to using BIGINT when it was obvious* that the column will be storing values greater than 2.147 Billion (the max of INT).

That's what I did too, until 2-3 years ago! I started changing my default mindset to using BIGINT over INT, reversing my long-held habit. This post explains why I default to using BIGINT and examine the performance impacts of the decision.


As I conclude at the end:

The tests I ran here show that a production-scale database with properly sized hardware can handle that slight overhead with no problem.

Why default to BIGINT?

The main reason to default to BIGINT is to avoid INT to BIGINT migrations. The need to do an INT to BIGINT migration comes up at the least opportune time and the task is time consuming. This type of migration typically involves at least one column used as a PRIMARY KEY and that is often used elsewhere as a FOREIGN KEY on other table(s) that must also be migrated.

In the spirit of defensive database design, BIGINT is the safest choice. Remember the *obvious part mentioned above? Planning and estimating is a difficult topic and people (myself included) get it wrong all the time! Yes, there is overhead for using BIGINT, but I believe the overhead associated with the extra 4 bytes is trivial for the majority of production databases.

Continue Reading

OpenStreetMap to PostGIS is getting lighter

By Ryan Lambert -- Published May 01, 2021

If you have ever wanted OpenStreetMap data in Postgres/PostGIS, you are probably familiar with the osm2pgsql tool. Lately I have been writing about the osm2pgsql developments with the new Flex output and how it is enabling improved data quality. This post changes focus away from the flex output and examines the performance of the osm2pgsql load itself.

One challenge with osm2pgsql over the years has been generic recommendations have been difficult to make. The safest recommendation for nearly any combination of hardware and source data size was to use osm2pgsql --slim --drop to put most of the intermediate data into Postgres instead of relying directly on RAM, which it needed a lot of. This choice has offsetting costs of putting all that data into Postgres (only to be deleted) in terms of disk usage and I/O performance.

A few days ago, a pull request from Jochen Topf to create a new RAM middle caught my eye. The text that piqued my interest (emphasis mine):

When not using two-stage processing the memory requirements are much much smaller than with the old ram middle. Rule of thumb is, you'll need about 1GB plus 2.5 times the size of the PBF file as memory. This makes it possible to import even continent-sized data on reasonably-sized machines.

Wait... what?! Is this for real??

Continue Reading

(Webinar) OpenStreetMap to PostGIS: Easier and Better!

By Ryan Lambert -- Published March 31, 2021

This page has the resources and recording for the OpenStreetMap to PostGIS: Easier and Better! webinar from Wednesday March 31, 2021.

Downloads for session

Scripts used for the demo:

Continue Reading

Round Two: Partitioning OpenStreetMap

By Ryan Lambert -- Published February 26, 2021

A few weeks ago I decided to seriously consider Postgres' declarative table partitioning for our OpenStreetMap data. Once the decision was made to investigate this option, I outlined our use case with requirements to keep multiple versions of OpenStreetMap data over time. That process helped draft my initial plan for how to create and manage the partitioned data. When I put the initial code to the test I found a snag and adjusted the plan.

This post shows a working example of how to partition OpenStreetMap data loaded using PgOSM-Flex.


Spoiler alert!

It works, I love it! I am moving forward with the plan outlined in this post. Some highlights from testing with Colorado sized data:

  • Bulk import generates 17% less WAL
  • Bulk delete generates 99.8% less WAL
  • Simple aggregate query runs 75% faster

Continue Reading