Colorado Unemployment: Making the Maps
Earlier this month I wrote about a few different software options for GIS related needs. In this post I will give a high-level overview of how I used PostGIS and TileMill to make the maps for the video below. The video shows the unemployment rate for the state of Colorado by county for the years 2000 through 2013.
Maps! ArcGIS, QGIS, and TileMill
In this post I will discuss some of the various options for working with spatial data. I have been using ESRI's ArcGIS suite for roughly four years now but have also recently started working with some more freely available tools, QGIS and TileMill, and thought this would be a good time to write about some of my experiences. You can see an example of what I've done with TileMill at my post here.
PostgreSQL Setup on Debian
Warning: This post is outdated. It is here for reference purposes only.
This post will go over how to install, setup and configure a functional PostgreSQL database on a Debian 7 server. I will be doing this on a virtual machine in VirtualBox, but these steps should be valid for any Debian server. If you haven't ever setup a server like this, first read my 3-part series that covers setting up a basic Debian 7 virtual machine. The first step is to install Postgres, which is very easy, then we will move on to configuring the server, creating databases and users, and even making it possible to connect to the server remotely.
I can't go over every possible option in this post, but will do my best to give a pretty decent way to setup a database server. You should evaluate all of my recommendations with your needs and policies in mind.
Data Processing in Python: PyTables vs PostgreSQL
One of the challenges when working with data is processing large amounts of it. Parsing out the data you really want, cleaning it up so you can work with it, and then effectively being able to work with it are key components to consider. In this post I'm going to use try out PyTables, which utilizes HDF5 storage, and compare it with a popular relational database, PostgreSQL. I will be looking at how long it takes to load the data from raw form (csv format in txt file), how much space it takes on the server, and ease of processing and querying the data once it is loaded.
The data I will be using for this test is weather data from NOAA. I am using data included in the QCLCD201312.zip archive, specifically the 201312hourly.txt and 201312station.txt files.
Python and Databases
With all the cool things I recently discovered with Python, and some headaches with one of our systems at work, I wanted to make a case for setting up a dedicated Python Notebook server for my department. Before I get into the fun details of testing the setup for our needs, I should probably explain our needs. My goal is to replace our (very expensive) SPSS licenses with something not so expensive. It just so happens that Python, Pandas, MatplotLib, iPython and all the other goodies come at the correct price of Free. The only cost that should come out of this is the virtual server that would be needed to run it.
In order to evaluate if python could be used provide a viable alternative, I had to ask: what do we use SPSS for currently? Well, we load data from csv format to MS SQL and run an occasional statistical analysis. That's about it.