Installing Scipy? You Need Swap!
I have been working in Python a lot lately. In fact, I've written my own new blog application, RustProof Content, in Python using Flask. It's a Flask app serving almost-static content that is converted to HTML and Jinja2 tags from Markdown. In fact, if you're reading this now, you're looking at RustProof Content!
I've also done quite a bit of work with spatial data, and even dabbled with a bunch of weather data! Most of that was done in IPython Notebooks and I always end up wanting SciPy at some point. SciPy is great, but installing it on a small virtual machine can cause some headaches...
The Problem?
Installing that entire NumPy, SciPy and Pandas stack on a single core virtual machine with 1 GB RAM typically takes 20-30 minutes. And that's if it doesn't crash in the middle of the process. With 1 GB RAM and no SWAP enabled, the VM will probably crash trying to install SciPy.
When the virtual machine freezes during this process, it completely freezes. If you happen to be SSH'd in before it goes and have top running, the CPU will continue to update and show a 1 minute CPU usage in the range of 2.3 to 5. But, you won't see a related process in the list below. At that point the terminal was completely unresponsive and I had to do a hard shutdown and restart the machine.
The Quick Fix
Add 1 GB SWAP before installing SciPy.
sudo dd if=/dev/zero of=/swapfile1 bs=1024 count=1024000
sudo mkswap /swapfile1
sudo chown root:root /swapfile1
sudo chmod 0600 /swapfile1
sudo swapon /swapfile1
Now install SciPy:
pip3 install scipy
My virtual machine ended up only using about 80 MB of the SWAP... but it did use it!
The Better Fix
Instead of half-isolating different Python environments and repeatedly installing the same components using virtualenv, I've started Dockerizing
my Python environments. I've created a set of Dockerfiles that build various Python 3.4 environments. I have a basic environment that isn't used for much other than my starting point for other Python images. I've built a Scientific image that includes everything I've listed above. I also have an IPython notebook image and a Flask image. Now I don't have to keep installing SciPy, I let Jenkins build it while I sleep and download it if I need it.
I'll write more about how I'm using Docker for Python in an upcoming post.