Fun With Python
Warning: This post is outdated. It is here for reference purposes only.
I think it's pretty common that most individuals who write code have their favorite go-to language. For years my language of choice has been PHP. I always defaulted to PHP because I had done so much in PHP that it just came naturally and I could focus on my task which is writing code. I think it's a good thing to use familiar tools because it makes you more efficient in solving the problem at hand, but its also vital to see what else is out there. I haven't been unhappy with PHP, but I had been wanting an excuse to dive into Python for a while now and a few weeks ago my reason to seriously give Python a try came up. I had a project that needed to get done and it required working with a lot of data, doing some statistical analysis and generating charts.
Modules Make the Difference
The modules available for use in Python truly do make the difference. The stand-out modules for me are:
I'm still getting used to the different ways to store, manage and manipulate data via Pandas and NumPy, but the power and flexibility is staggering. Pandas made it almost too easy to pull a data source directly from a database, import multiple raw data sources from csv files and then merge the records together into a joined data frame. I am still struggling a bit with common operations like filtering out a sub-set of results, but that is because I haven't figured out all the minor nuances with the different data structures available. That's a minor setback and as I become more familiar with these operations and the language that should smooth itself out.
Beyond pandas and numpy making data management and manipulation much too easy, matplotlib ensured my love by allowing me to generate nice looking charts. Over the past few years I have used a number of open source charting libraries to generate data driven visualizations and have recently enjoyed the Google Charts API... but matplotlib is easier and better IMO. It might not have all the bells and whistles the Google API does, especially when it comes to hover-over info on the chart, but having it built into the language makes things much more streamlined. Also, since Google has a history of decommissioning my favorite Google products I am trying to avoid becoming entrenched in anything they provide outside of gmail.
iPython Jupyter Rocks
When I first started my project in Python I saw some rah-rah'ing about
iPython
Jupyter Notebook
and I quickly looked at it and decided to come
back to it after the project was done. What a mistake that was!! To be fair to
myself, I had more than enough on my plate and I chose to skip spending the
time installing, configuring and figuring out iPython so I could hopefully
just complete the project and get caught up. However... I would have saved hours
of time if I would have just "bit the bullet" and checked it out at the start.
For me, the iPython shell itself isn't all that exciting. Sure, it has some big improvements over the plain vanilla python shell but honestly they didn't make much impact on me. My experience with working in a programming language in the command line is that after a few commands I need to get back to my first command and run them all over again from the start in order. Yes, I could "arrow up" to cycle through my command history but that becomes terribly inefficient once you have more than three commands.
The solution: iPython Jupyter Notebook!
iPython Jupyter Notebooks allow you to write and execute python code in your browser.
More importantly, it allows you to execute small chucks of Python code at a
time how you see fit. Each "cell" in the notebook can be ran as a single
script. This means that I can write a little, test a little, write a bit more,
and test a bit more. As the code grows and evolves it makes it wonderfully
easy to restructure, refactor, and test your changes in small chunks.
They also have provided functionality for making cells in notebooks display text in markdown format and I think that's brilliant! I much prefer having nicely formatted text explaining the code where needed vs having ugly comments thrown everywhere in the code. This also allows for a nice way to explain the results returned by the code directly in line with the results (and code). The screenshot below shows the toolbar in an iPython notebook, a cell of markdown, and a small block of code with its results.
Performance Notes
I'm rather impressed with Python's performance. I went ahead and setup a notebook server on a Debian virtual machine with one core and 1 GB RAM. By my definition that VM is not a powerful server, just for a reference point. (Watch for a future post on setting up a remote iPython notebook server!) To test Python's performance with data handling and processing I obtained daily weather data for 2013 from NOAA and wrote a script to work with it. I'm importing each month of data for 2013 and calculating the average daily temperature and wind speed for all of Colorado as well as just the Denver area. The raw text files contain roughly 450,000 records of data that I filtered out the subset needed and cleaned/processed the remaining data. The screenshot below shows one of the charts it generates in under 30 seconds. (All done in iPython Notebook!)
Final Thoughts
I'm really enjoying the Python language. Yes, the language itself is nice to write in, easy to understand, and all that jazz... but the real value is how the community has come together to build the wide variety of tools that currently exist for Python. For me, the interactive notebook and powerful data management/analysis tools have won me over and I expect that PHP will be much less prominent in my future projects.
Final parting thought: Python has booted Java from it's reign of being the most popular teaching language!
If you have a different opinion or if I missed something, let me know in the comments!