Large Text in PostgreSQL: Performance and Storage
Storing large amounts of data in a single cell in the database has long been a point of discussion. This topic has surfaced in the form of design questions in database projects I have been involved with over the years. Often, it surfaces as a request to store images, PDFs, or other "non-relational" data directly in the database. I was an advocate for storing files on the file system for many, if not all, of those scenarios.
Then, after years of working with PostGIS data
I had the realization that much of my vector data that performs
so well when properly structured and queried, was larger and more complex
than many other blobs of data I had previously resisted.
Two years ago I made the decision to store images in a production database
using BYTEA
. We can guarantee there are a limited number of images
with a controlled maximum resolution (limiting size) and a specific use
case. There was also the knowledge that caching the images in the frontend
would be an easy solution if performance started declining.
This system is approaching two years in production with great performance.
I am so glad the project has a singular data source: PostgreSQL!
Use PostgreSQL file_fdw to Access External Data
Loading external data is a core task for many database administrators. With Postgres we have the powerful option of using Foreign Data Wrappers (FDW) to bring in external data sources. FDWs allow us to access data that is external to the database by using SQL from within the database. These external data sources can be in a number of formats, including other relational databases (Postgres, Oracle, MySQL), NoSQL sources (MongoDB, Redis), raw data files (csv, JSON) and many more.
This post shows how to use
file_fdw
to load remote data from CSV files available from GitHub. Sharing data
via public source control tools like GitHub has become a common way
to make data sets widely available. Other public data is available
from various government and non-profit sites, so this is a handy tool to have available and reuse.
External data source
For this post I am using the COVID-19 data John Hopkins University is curating. See the main GitHub page for full attribution and meta-details about the data.
Pi 4 Performance: PostgreSQL and PostGIS
Happy Pi (π) Day! I decided to celebrate with another post
looking at the Raspberry Pi 4's
performance running Postgres and PostGIS.
A few months ago I added another Raspberry Pi to my collection,
the new Model 4 with 4 GB RAM. My
initial review
focused on pgbench
results comparing the Pi 4 against the previous
3B models as well as the even lower-powered Raspberry Pi Zero W.
This post continues testing the Rasperry Pi 4's performance with
PostgreSQL and PostGIS, this time with a look at a more suitable
setup for production use cases.
The main difference are the use of an external SSD drive and
full-disk encryption.
Hardware and Configuration
The Raspberry Pi 4 is mounted in an enclosed Cana Kit case with a small fan in the top running on the 3.3 V rail (lower power), powered by a dedicated 3.5A power supply.
Webinar: PostGIS Advanced Features (6/6)
This page has the resources for the PostGIS Advanced Features webinar from Tuesday March 4, 2020. This session is the sixth and final (!) in a series of six covering how I work with PostGIS and OpenStreetMap. See the intro page for links to the other videos in the series.
Recording
Webinar: Postgres 12 and PostGIS 3 (5/6)
Below is the recording of my PostgreSQL 12 and PostGIS 3 webinar from February 26, 2020. This is the fifth in a series of six covering how I work with PostGIS and OpenStreetMap.
I had a great time presenting this session, I hope having this recording available is helpful. Below the video is a list the links showing throughout the slides.