PostGIS: Tame your spatial data (Part 1)
The goal of this post is to examine and fix one common headache when using spatial data: its size. Spatial data (commonly called "GIS data") can become painfully slow because spatial data gets big in a hurry. I discussed this concept before by reducing the OSM roads layer to provide low-overhead thematic layers. This post uses the same basic philosophy to reduce another common bottleneck with polygon layers covering large areas, such as the US county boundaries.
This post goes into detail about using PostGIS
to simplify large
polygon objects, the effects this has on storage, performance
and the accuracy of spatial analysis when using simplified geometries.
The solution provided is not suitable for all data and/or use cases,
particularly if a high level of accuracy and precision is required.
One of the main reasons why PostgreSQL became my personal favorite RDMS was its PostGIS extension, providing the most robust spatial database in the world.
This post is an advanced topic of the series PostgreSQL: From Idea to Database.
A numeric example
This post assumes basic knowledge of SQL syntax, PostgreSQL and PostGIS.
To begin this conversation, let's examine a bit about what we already know about storing numbers. Let's start with Pi (π).
Pi is a mathematical constant, roughly represented as 3.14159
,
though the decimals go on and on into infinity. The following example shows how to create a table with a single row storing two
different representations of Pi. The first column, pi_long
, stores 31 decimals of Pi. The pi_short
column stores only two decimal
places.
DROP TABLE IF EXISTS cool_numbers;
CREATE TEMP TABLE cool_numbers AS
SELECT 3.1415926535897932384626433832795::NUMERIC(32,31) AS pi_long,
3.14::NUMERIC(3,2) AS pi_short;
As you may expect, storing a number with more digits takes more disk space than storing a number with fewer digits. This is called
precision. The following
query uses the pg_column_size()
function to illustrate that the number with more decimal spaces takes up more disk space.
PostgreSQL: From Idea to Database - Table of Contents
PostgreSQL: From Idea to Database is a series of blog posts written to teach practical database design. This series focuses on practical examples of working with Postgres to illustrate the why and how of operation.
Note: This series is an evolving and improving collection of blog posts. I regularly update this page as content is published and maintained.
Table of contents
The following listing includes links to the pages that are already published along with placeholders for planned future topics. Future topics (e.g. titles without links) are likely to change in title, exact content and scope.
PostgreSQL: From Idea to Database - Introduction
Databases are a critical component for nearly any modern application. The database is also a component that often is surrounded by confusion and apprehension.
This post is the first of the series PostgreSQL: From Idea to Database. The goal of this series to provide a guideline to how we approach database development at RustProof Labs. This includes covering both methodology and technology. This series provides working code examples, friendly explanations, and real-world database design scenarios. Hopefully this series helps explain and teach database design is a new, friendly format.
Project for reference
A major challenge with teaching database design is when you don't have a real project to use as an example. For this, I've chosen to use The PiWS project, an open source weather station designed around the Raspberry Pi. To read more about the PiWS, see my introductory post.
The PiWS project was chosen for a variety of reasons, but there are three main reasons:
- I know the code
- The project was developed in a way worth describing
- It's open source!
The third reason is a big benefit because the source code used in this series will come from the project's GitHub repository. This gives us a real project with real code to study how to approach the task of designing a database.
Throughout this series I provide examples of how the PiWS project was initially designed and how it has evolved.
Introducing the PiWS (Pi Weather Station)
The PiWS, short for The Pi Weather Station, is an open source, affordable weather station designed for everyone. This is not a traditional, commercial weather station that provides you a fixed set of sensors and collect data without providing you a way to keep your own data.
Instead, The PiWS is designed around open source hardware and software, provides you with a choice of sensors, and encourages long-term retention of your sensor data.
What does it do?
The PiWS collects data from a variety of analog and digital sensors to collect data from your local environment. When we talk about "environment", that could be indoor, outdoor, garden... or in the attic, basement, garage, or that one room upstairs that always seems cold.
The data from the sensors is collected in a PostgreSQL database on the Raspberry Pi, allowing you to store the data long term, analyze trends, or just satisfy your curiosity!
Why the PiWS?
The idea for the PiWS started around the time I started building TrackYourGarden (TYG) in 2015; it's another part of my desire to understand everything possible about our local micro-climate. I said at that time:
"It's hard to answer the questions of "When can I plant plant X?" living at 6,000 feet. We have a fairly short growing season; it has snowed on Mother's day two years in a row now, and it's not unusual to have at least one light snowfall around Halloween.... so naturally I figured the best thing to do was to gather some data (surprise!)"
GDPR Preparation: Google Analytics
This post was written after receiving an email from Google regarding our Google Analytics (GA) account. The email was received on 4/11/2018 with the subject line: “[Action Required] Important updates on Google Analytics Data Retention and the General Data Protection Regulation (GDPR).” This post includes a basic overview of the steps to take in your Google Analytics account to review and update your settings. The topic of data collection naturally leads into a discussion of data minimisation to round things out.
Philosophy
Data privacy is one of RustProof Labs’ core values. We believe GDPR is an example of moving the needle in the right direction in preserving and protecting our personal digital rights. Changes outlined in this document are in line with the opening statement in our Terms of Service:
RustProof Labs is a company committed to data, cybersecurity, learning, open source projects and are advocates for the right to privacy and security on the Internet.
This is another preparation for the approaching GDPR implementation on 5/25/2018 for anyone using Google Analytics. Our free Blog @ RustProof Labs, where you're reading this right now, receives hundreds of visitors each month from the EU. So, it is important that all RustProof Labs’ services, freely available blog included, are compliant.