Ode to "The Database"
I'm a big fan of relational databases. Over the years I have tried out various ways to work with and manage data without utilizing a database. Every time I try to do that, I am reminded why I love databases so much... they do one job, and do it well. Manage your data. It's hard to argue with decades of success!
In this post I will be referencing PostgreSQL 9.4 specifically, but most of my arguments and examples should translate to any of the major RDMSs, such as MS SQL Server, Oracle Database, or MySQL.
Colorado Unemployment: Making the Maps
Earlier this month I wrote about a few different software options for GIS related needs. In this post I will give a high-level overview of how I used PostGIS and TileMill to make the maps for the video below. The video shows the unemployment rate for the state of Colorado by county for the years 2000 through 2013.
Maps! ArcGIS, QGIS, and TileMill
In this post I will discuss some of the various options for working with spatial data. I have been using ESRI's ArcGIS suite for roughly four years now but have also recently started working with some more freely available tools, QGIS and TileMill, and thought this would be a good time to write about some of my experiences. You can see an example of what I've done with TileMill at my post here.
PostgreSQL Setup on Debian
Warning: This post is outdated. It is here for reference purposes only.
This post will go over how to install, setup and configure a functional PostgreSQL database on a Debian 7 server. I will be doing this on a virtual machine in VirtualBox, but these steps should be valid for any Debian server. If you haven't ever setup a server like this, first read my 3-part series that covers setting up a basic Debian 7 virtual machine. The first step is to install Postgres, which is very easy, then we will move on to configuring the server, creating databases and users, and even making it possible to connect to the server remotely.
I can't go over every possible option in this post, but will do my best to give a pretty decent way to setup a database server. You should evaluate all of my recommendations with your needs and policies in mind.
Data Processing in Python: PyTables vs PostgreSQL
One of the challenges when working with data is processing large amounts of it. Parsing out the data you really want, cleaning it up so you can work with it, and then effectively being able to work with it are key components to consider. In this post I'm going to use try out PyTables, which utilizes HDF5 storage, and compare it with a popular relational database, PostgreSQL. I will be looking at how long it takes to load the data from raw form (csv format in txt file), how much space it takes on the server, and ease of processing and querying the data once it is loaded.
The data I will be using for this test is weather data from NOAA. I am using data included in the QCLCD201312.zip archive, specifically the 201312hourly.txt and 201312station.txt files.