PostgreSQL vs. MySQL: Why we use PostgreSQL
I fully believe that PostgreSQL is the best choice for a reliable and high performance database. This post was inspired after reading a post on Udemy concluding with MySQL being better than PostgreSQL. There are a number of points I disagree with in that post so I decided to make my case of why I think PostgreSQL is a better choice in most situations. In this post I start by countering a few statements from the Udemy post, providing examples from my experience and other resources. Towards the end I go over a few of the PostgreSQL features that I believe makes it truly stand out.
If all you need is to support 3rd party software that supports MySQL (and not PostgreSQL), then you obviously don't have a choice and this discussion is irrelevant. On the other hand, if you're looking for a powerful, reliable, and open source relational database platform:
PostgreSQL is my recommendation.
Experience Supported with Facts
I've been developing, maintaining and administering databases for more than a decade. I've developed databases professionally for three different platforms starting with MySQL in 2004, then MS SQL Server in 2008 and, most recently, PostgreSQL in 2010. I've also dabbled with Oracle DB, but not enough to count my experience worthwhile here. Of all of them, PostgreSQL is by far my favorite. I have also experimented with NoSQL offerings Hadoop, Cassandra and Mongo DB but have yet to find a use case that is better suited for those products.
While I have experience with many products, I am trying to keep this post to just MySQL and PostgreSQL.
Preferences are Personal Choices
I'm saying I like PostgreSQL better, but that doesn't mean I think MySQL is bad. Similarly, I chose to use Python over PHP or Visual C#, not because those languages can't get the job done... but because I like Python more for:
- The type of work I do, and
- The way I like to work
Similarly, I prefer PostgreSQL over MySQL because I feel I can build a more secure, more maintainable, and more reliable product with less effort.
Issue: Not Following SQL Standards
Back to the point of this post. The first section of the Udemy article starts with:
"One issue with off-shoot SQL database engines"
What is meant by "off-shoot SQL database engines?" I assume the author is referring to the fact that PostgreSQL spawned from Ingres back in 1996. MySQL came about in 1995, so looking at age under the current projects MySQL is only one year older.
Moving on, back to standards:
"While PostgreSQL does follow many of the changes, it has a lot of additional coding that is not regulated as part of SQL standardization. When you’re learning a language, it’s best to learn a language that follows basic standards."
That statement sounds like PostgreSQL doesn't follow the SQL standard. According to the PostgreSQL Docs:
"PostgreSQL supports most of the major features of SQL:2011. Out of 179 mandatory features required for full Core conformance, PostgreSQL conforms to at least 160. In addition, there is a long list of supported optional features. It might be worth noting that at the time of writing, no current version of any database management system claims full conformance to Core SQL:2011."
So PostgreSQL has
160 / 179 mandatory features = 89% which is a B+. That's pretty good, but not perfect.
Though, on that note I haven't been able to find any official statement from MySQL about that ratio
and my experience shows they deviate from the standards themselves. If you can find some metric of
compliance from MySQL, please contact me with details and I will update this post.
Another way to interpret the quote above is that PostgreSQL has too many "extras" that aren't defined in the SQL Standard. I'd argue those features, some discussed below under "Why I Love PostgreSQL", are what make PostgreSQL the amazing beast it is.
I also find it interesting the author uses the "extras" as a negative while the MySQL docs say:
"We are not afraid to add extensions to SQL or support for non-SQL features if this greatly increases the usability of MySQL Server for a large segment of our user base."
So, when PostgreSQL adds extra functionality it's a problem, but when MySQL adds it in it's a bonus? PostgreSQL follows the vast majority of the standards and adds the addition of some stellar improvements.
Issue: Security vs. Scalability
"New development is one of the most important features with a startup, especially if your startup is in technology. Your product will probably be more secure if you choose PostgreSQL, but it will be more scalable if you choose MySQL."
I'm unsure why the author thinks MySQL is more scalable than PostgreSQL, there's no supporting details of how that's true. Maybe just because they heard that Facebook runs off MySQL? That detail aside, the more important issue is that if you're given the following choices, which should you choose?
- More Secure
- More Scalable
With data breaches now a regular part of the headline you should not just decide to go for ultimate scalability. Sure, keep it in mind as a goal but in most cases you don't need Amazon's level of performance. If you don't have security as a priority you should not be worried about scalability. Especially if your startup is in technology! It's also good to note that 90% of startups fail. So the delusion that you must be instantly scalable is silly at best.
Focus on security to help prevent a data breach from ruining your startup.
"Ultimately, PostgreSQL only gives you an advantage when you are working with proprietary software (such as Oracle) because it more easily integrates their modifications."
This statement confuses me greatly, because Oracle has their own database (see my post on installing Oracle DB) and I'm unaware of any proprietary Oracle software that only works with PostgreSQL. I'm also unaware of any proprietary software in general that only supports PostgreSQL. Further, Oracle bought MySQL in January 2010 when it purchased Sun, which had bought MySQL in 2008.
"You wouldn’t use a power tool to put a nail in the wall to hang a picture, and you wouldn’t use PostgtreSQL to work with an Access database."
Well, I certainly wouldn't use MySQL to work with Access either! I would recommend migrating the Access application to a reliable platform.
Issue: Terabytes of Data
"If you plan to collect several terabytes of data, you might choose PostgreSQL for its data organization and protection."
Are you saying that MySQL is an unorganized RDMS? Either way, we're back to the security of data
again. It seems the author only thinks it's worth protecting data if there's terabytes of it?
Critical data comes in all sizes, from bytes to terabytes. It only takes 9 bytes in a
field to store my SSN, but that's worth keeping secure! If you don't bother protecting the small
data you won't (shouldn't!) survive long enough to collect more.
1 TB would hold just over 122 Billion Social Security Numbers
Why I Love PostgreSQL
Other than being open-source and powerful, there are a few things that really make PostgreSQL amazing.
- Small Footprint
- Date Math (Interval)
- Unlogged tables ("Stupid fast")
I run multiple PostgreSQL servers within Docker on a single core, 512 MB RAM VPS from Digital Ocean. I have roughly 8 - 10 GB of data across these databases at any given time. This is all data from NOAA, OpenStreetMap.org, and the US Census Bureau. Everything runs smoothly, and I still feel like I have multiple powerful servers working for me.
PostGIS is an extension for PostgreSQL to enable storing and processing spatial (geographical) data. I've been using PostGIS for almost 2 years now, and other offerings just aren't up to par. You can read how we're implementing PostGIS for our upcoming service, RustProof Data. Another example of how I used PosGIS is here.
PostgreSQL has provided a simplified syntax for casting data to data types.
column::DATATYPE. I use that all the time because I have no plans to convert from
PostgreSQL to any other database, but if you're worried about cross-platform compatability, you
can still use the SQL standard:
CAST(column as DATATYPE. It's your choice.
Date Math (Interval)
Working with dates in general seems to be a pain. There are dates, and dates with times, and sometimes the time is always midnight (just wasting space). I've noticed I have far less issues manipulating dates in PostgreSQL due to the syntax. For some reason I can just remember it.
Getting the value for 1 hour earlier than now...
The PostgreSQL way:
SELECT NOW() - INTERVAL '1 hour'
The MySQL way:
SELECT DATE_ADD(NOW() ,INTERVAL -1 HOUR)
The MS SQL Server way:
SELECT DATEADD(hh, -1, GETDATE())
I became intrigued by UNLOGGED tables after reading this post. For loading a lot of data as quickly as possible, this is a really great feature. I was curious how much of a performance boost it would provide so I tested it.
Con Against PostgreSQL
To be fair, I will say there's one big hurdle to PostgreSQL. It's much harder to learn from scratch if you're responsible for installing and configuring the server. If you're willing to become familiar (or already familiary) with Docker and want a good way to setup a test server, check out my post on Dockerizing PostgreSQL.
Obviously I am biased towards PostgreSQL compared to MySQL and I believe it's the better database choice for any new development projects. Further, just because something is a little harder to learn it doesn't mean it's a bad thing. A perfectly grilled ribeye takes more practice to get right than mac & cheese. That doesn't mean the steak isn't worth the practice! If you want to do something right (store and retrieve data) use the right tool.
If you need help getting setup with PostgreSQL please contact us about with what your database needs are, and we'll be in touch.
Published August 01, 2016
Last Updated July 31, 2016