RustProof Labs: blogging for education (logo)
My book Mastering PostGIS and OpenStreetMap is available!

PostgreSQL 10 Parallel Queries and Performance

By Ryan Lambert -- Published February 12, 2018

Last week I was reviewing my list of topic ideas and I didn't feel like writing about any of them. So I headed to the PostgreSQL Slack Channel to ask for ideas. One of the great topic ideas was to explore the performance of PostgreSQL 10's improved Parallel Query feature and that sounded like a lot of fun, and something I should consider anyway. This feature was introduced in Pg 9.6 but has seen major improvements in the latest version.

April 2019 update: I put parallel query in PostgreSQL 11 to the test on the Raspberry Pi 3B. Check it out!

Parallel query is enabled by default in PostgreSQL 10 and is controlled through the configuration value for max_parallel_workers_per_gather, set to 2 by default. To see more about configuring this feature, see this post.

Note: PostgreSQL changed their versioning policy with version 10.

Data for Testing

I decided to test this feature by loading the database of NOAA QCLCD weather data we've accumulated. There's about 10 years of daily weather observations in this database with that main table having nearly 4 million rows. This table is commonly joined to two tables, one storing the details about the weather station for each observation, the other linking to a calendar table.

Continue Reading

Database Anti-patterns: Performance Killers

By Ryan Lambert -- Published January 28, 2018

Databases are everywhere. They're in your computer, smartphone, WiFi enabled devices, and power all the web-driven technologies you use 24/7. Relational databases have the ability to provide incredible power, performance, reliance, and reporting. They also have the power to inflict severe pain, frustration, confusion and sleepless nights. You can expect the latter when a database is designed incorrectly.

This post attempts to summarize the most common, and most problematic relational database anti-patterns I have seen. These problems are not a technology problem; they are a training and experience problems. Some of these are mistakes I have made myself. Others I inherited in the form of a pre-existing system, and others still I was hired explicitly to fix.

Education is a Challenge

Relational database design, like cybersecurity, spans a wide range of very in-depth topics full of nuances and experience-driven decisions. In other words, there is a lot to learn and it isn't easy to teach. Another aspect of the problem is that database design isn't being taught very well, if at all, in typical database related coursework. Database courses focus on dry definitions of the various levels of normalization, yet spend zero effort teaching best practices for the design process or the decisions that go into determining the appropriate level of normalization.

Continue Reading

PostgreSQL Load Testing

By Ryan Lambert -- Published December 09, 2017

A recent conversation got me thinking about database performance, specifically PostgreSQL. Well, to be honest, I'm always thinking about databases and their performance. I've pushed PostgreSQL to load 500,000 records per second, compared PostgreSQL to MySQL, and more.

This conversation did make me decide it was time to put one of RustProof Labs' newest PostgreSQL databases to the test. The database is part of a new product we plan to go live with in Spring 2018. Now is the perfect time for us to do a final check of our estimates for how many users this system will be able to handle. If our assumptions were wrong we still have time to adjust.

Continue Reading

Goodbye, NewRelic

By Ryan Lambert -- Published October 14, 2017

I'm sad to say, RustProof Labs will no longer be using NewRelic's monitoring services. They are making some significant changes to their offerings and unfortunately, that means RustProof Labs will be finding new solutions for our infrastructure monitoring. They are removing their "Servers" monitoring feature from the free tier; Servers is the only service from NewRelic that we really utilized. Their paid-only replacement for monitoring servers is infrastructure.

What's Changing?

The main difference seems to be they're pushing integrations with some of the main cloud providers (AWS, Azure, etc). I didn't want that before and I don't want it now. I loved being able to see my server's health at a very high level, such as CPU and RAM usage, as well as disk space monitoring, and so on. I loved being able to install their service on all my local virtual machines as well as cloud servers. I could easily visualize the impact the difference between HDD and SSD drives in specific processes.

Infrastructure Pricing

Note: These pricing details are accurate at the time of writing.

To continue using NewRelic to monitor our servers using Infrastructure appears to start at $9.90 / month. That's more than the monthly cost of a small cloud-based server!

"The minimum # of CUs to be purchased as part of any customer subscription is 16,500 costing $9.90. This is the smallest contract size."

Continue Reading

Thoughts on Data Security

By Ryan Lambert -- Published September 16, 2017

The recent Equifax data breach got me thinking more in depth about the role RustProof Labs plays in cybersecurity. I'm not just talking about how we approach cybersecurity to protect ourselves and our clients, but instead the larger discussion about the need for more companies and individuals to take cybersecurity seriously.

Brian Krebs recently wrote about an egregious error made by Equifax that apparently isn't even related to the main breach:

It took almost no time for them to discover that an online portal designed to let Equifax employees in Argentina manage credit report disputes from consumers in that country was wide open, protected by perhaps the most easy-to-guess password combination ever: “admin/admin.”

This kind of vulnerability is inexcusable considering the magnitude of the data that Equifax is responsible for securing.

Our Personal Data is Priceless

To me, my own personal information is worth more than a bank vault full of gold. I want everyone who stores data about me to treat it that way. Unfortunately that is not the reality today.

Continue Reading

<-- Older Posts          Newer Posts -->