PostgreSQL 10 Parallel Queries and Performance
Last week I was reviewing my list of topic ideas and I didn't feel like writing about any of them. So I headed to the PostgreSQL Slack Channel to ask for ideas. One of the great topic ideas was to explore the performance of PostgreSQL 10's improved Parallel Query feature and that sounded like a lot of fun, and something I should consider anyway. This feature was introduced in Pg 9.6 but has seen major improvements in the latest version.
Parallel query is enabled by default in PostgreSQL 10 and is controlled through the configuration value for
set to 2 by default.
To see more about configuring this feature, see this post.
Note: PostgreSQL changed their versioning policy with version 10.
Data for Testing
I decided to test this feature by loading the database of NOAA QCLCD weather data we've accumulated. There's about 10 years of daily weather observations in this database with that main table having nearly 4 million rows. This table is commonly joined to two tables, one storing the details about the weather station for each observation, the other linking to a calendar table.
Python 3 and Jupyter Notebooks
This post is intended to help beginners make progress in writing code. When first learning to write code, it is easy to get discouraged. I hope to help others who share my goals: begin to understand a complex subject such as Python 3, write better code, and have some fun along the way.
Why Python 3?
Python 3 is the first language I am learning because it is a great place for a novice to begin. There are lots of articles on the web explaining why Python 3 is such a good choice. Reading some of these can explain the “why” better than I am able to with my limited understanding of the subtleties of various languages. I found the following article particularly helpful.
Python 3 is used extensively at RustProof Labs and is a personal favorite of my boss, Ryan. Perhaps the most compelling reason for a novice, such as myself, to start with Python 3 is there are often fewer lines of code involved than, for example, with Java to accomplish a given task. Less lines of code can be a good thing. This makes debugging easier to accomplish.
My First Year with RustProof Labs
They say time flies when you’re having fun. For me, 2017 flew by in the blink of an eye. My one-year internship with RustProof Labs (RPL) is now complete. The first of many new experiences last year involved integrating myself into the corporate culture of RPL. I can say without reservation, RustProof Labs is the most open and friendly environment in which I have ever worked. In the past, I worked for one of the biggest tech companies in the world. My experience at RPL far outpaces that job and every other organization I have worked for in my career.
Why has my experience with RPL been so positive? There are a few important reasons. First, my individual abilities are greatly valued at RPL. Our visionary CEO/Owner understands it takes many different skill sets to be successful. My background in business management, customer relations, teaching technology, communication, and graphic design have all come into play in my daily work. I have often been surprised when something I am good at, which seems unrelated, fits perfectly with the task at hand.
Database Anti-patterns: Performance Killers
Databases are everywhere. They're in your computer, smartphone, WiFi enabled devices, and power all the web-driven technologies you use 24/7. Relational databases have the ability to provide incredible power, performance, reliance, and reporting. They also have the power to inflict severe pain, frustration, confusion and sleepless nights. You can expect the latter when a database is designed incorrectly.
This post attempts to summarize the most common, and most problematic relational database anti-patterns I have seen. These problems are not a technology problem; they are a training and experience problems. Some of these are mistakes I have made myself. Others I inherited in the form of a pre-existing system, and others still I was hired explicitly to fix.
Education is a Challenge
Relational database design, like cybersecurity, spans a wide range of very in-depth topics full of nuances and experience-driven decisions. In other words, there is a lot to learn and it isn't easy to teach. Another aspect of the problem is that database design isn't being taught very well, if at all, in typical database related coursework. Database courses focus on dry definitions of the various levels of normalization, yet spend zero effort teaching best practices for the design process or the decisions that go into determining the appropriate level of normalization.
Lots of theory, little application.
PostgreSQL Load Testing
A recent conversation got me thinking about database performance, specifically PostgreSQL. Well, to be honest, I'm always thinking about databases and their performance. I've pushed PostgreSQL to load 500,000 records per second, compared PostgreSQL to MySQL, and more.
This conversation did make me decide it was time to put one of RustProof Labs' newest PostgreSQL databases to the test. The database is part of a new product we plan to go live with in Spring 2018. Now is the perfect time for us to do a final check of our estimates for how many users this system will be able to handle. If our assumptions were wrong we still have time to adjust.