Revisiting Decisions: Improving Systems and Processes
Revisiting your code of the past is a worthwhile exercise. More specifically, revisiting the decisions, reasons, and other factors that led to the current state is an extremely valuable exercise. Authors of technical work (databases, applications, calculations, etc) should re-evaluate the purpose of our systems and decide when "the way it's always been" needs to become "the way it used to be." While it is easy to proclaim this, it is often difficult to do in practice. Why? Because it's complicated, of course! 🙄 The more serious answer is that humans, including yourself, are involved. Technical people like to focus on the technology even though the technology is often a side-story to the bigger picture.
This post tells a story of a single project that spanned over more than a decade. While there are technical elements involved, you'll see some of the biggest technical issues were human created.
A non-exhaustive list of human elements include:
- Who was involved? (developer, end-user, other stakeholders, etc.)
- Experience of the people involved.
- Internal politics, then and now.
- Process and systems evolution.
The Widgets
database
This story you are about to read is true. The names have been changed to protect the innocent.
For this post, we will refer to the project as "The Widgets
Database."
It started with an Excel file that had far exceeded what should be done in Excel.
The top of the main Widgets
worksheet had a line with the warning:
Do not sort, you will break formulas. Spoiler alert! The formulas were already broken.
Can you use ltree for Nested Place Data?
The topic of the ltree
data type
has come up a few times recently. This intersects with a common type of query
used in PostGIS: nested geometries. An example of nested geometries
is the state of Colorado exists within the United States. The PgOSM Flex
project calculates and stores
nested polygon data from OpenStreetMap places into a handful of array
(TEXT[]
, BIGINT[]
) columns.
I decided to explore ltree
to see if it would be a suitable option for
PgOSM Flex nested places.
Spoiler alert: ltree
is not suitable for OpenStreetMap data in the way I would
want to use it.
Nested data in arrays
The following is what the "Colorado is in the U.S" would look like using a
Postgres TEXT[]
array:
{"United States","Colorado"}
Hosting a set of Postgres Demo databases
In April 2023, I submitted my proposal for a full-day pre-conference at PASS 2023. My chosen topic was focused on PostGIS, titled GIS Data, Queries, and Performance. A key part of my submission was that the session would be an interactive, follow-along type design. Julie and I believe that doing is key to learning so we wanted to enforce that as much as possible. The plan was to use real data and queries to teach a nuanced, technical topic to an audience of unknown size or background. I also knew that PASS is very much a Microsoft focused community.
Knowing these things, I could not assume specific pre-existing knowledge about Postgres and PostGIS. I also didn't want to assume they had a Postgres 15 instance with PostGIS immediately available. I decided the best approach was to provide participants each a demo Postgres database so they didn't have to worry about those steps. These demo databases would be pre-loaded with the same data and extensions I used for my demos. This would allow participants to run the same queries, on the same data, on the same general hardware.
Of course, when my proposal was accepted then I realized I had to figure out how I was actually going to deliver! This post explains how I deployed demo databases to the participants of my PASS 2023 pre-con session.
The net result was a reliable, secure (enough), scalable, and affordable setup.
UUID in Postgres: PgSQLPhriday #015
This month's PgSQLPhriday #015 topic is about UUIDs. Lætitia Avrot is this month's host, see her post for the full challenge text. The topic is described as a debate between the Database People and Developers. I'm not sure there's such a clean divide on people's opinions on the topic, as I know plenty of Database People that have settled on using UUIDs as their default. Similarly, I know even more developer types that have followed the arguably more conventional choice of using an auto-incrementing ID.
TLDR;
I avoid UUIDs. The only places I have used UUIDs in production are the places where a 3rd party system is involved.
Working with GPS data in PostGIS
One of the key elements to using PostGIS is having spatial data to work with! Lucky for us, one big difference today compared to the not-so-distant past is that essentially everyone is carrying a GPS unit with them nearly everywhere. This makes it easy to create your own GPS data that you can then load into PostGIS! This post explores some basics of loading GPS data to PostGIS and cleaning it for use. It turns out, GPS data fr om nearly any GPS-enabled device comes with some... character. Getting from the raw input to usable spatial data takes a bit of effort.
This post starts with using ogr2ogr
to load the .gpx
data to PostGIS. Once the data is in PostGIS then
we actually want to do something with it. Before the data is completely
usable, we should spend some time cleaning the data first.
Technically you can start querying the data right away, however, I have found there
is always data cleanup and processing involved first to make the data truly useful.
This is especially true when using data collected over longer periods of time with a variety of users and data sources.
Travel database project
Before I started writing this post I had assumed that all of the code would
be contained in the body of this post. It turned out that to get to the quality
I wanted, I had to create a new
travel database project (MIT licensed)
to share the code. This is already a long post without including a few hundred
more lines of code! The travel-db
project creates a few tables, a view, and a stored procedure
in the travel
schema. The stored procedure travel.load_bad_elf_points
is
responsible for cleaning an importing the data and weighs in at nearly 500 lines
of code itself.