PostgreSQL Crash Course
PostgreSQL (postgres, pg) is a powerful, open-source relational database (RMDS). It is the database of choice here at RustProof Labs, I wrote previously why I chose PostgreSQL over MySQL. This post provides a 10-mile view of how PostgreSQL is structured, tools to use, and other important concepts. The goal is to provide a broad view of PostgreSQL with resources to more detail on each topic, without getting caught up in too much technical jargon.
This post is part of the series PostgreSQL: From Idea to Database.
PostgreSQL will run on most any modern hardware and operating systems. PostgreSQL has installation guides for a large number of operating systems, supporting most Linux/Unix, Mac, and Windows. As for hardware, it will run on nearly anything modern, including a Raspberry Pi (ARM processor). See the PostgreSQL supported platforms for more details.
I have successfully ran PostgreSQL on Linux servers of all sizes, from tiny to powerful. I previously wrote about load testing PostgreSQL that gives more detail about some of the systems it can run on and what type of performance to expect.
Structure and Terminology
Every RDMS has its own way to organize the elements required for a reliable database system. PostgreSQL has the following hierarchy to organize the required elements.
Cluster > Database > Schema > Table > Data
Data is stored as rows, contained in tables, within a schema, within a database, within a cluster. The PostgreSQL Concepts page provides a nice explanation of this overall structure.
When I talk about PostgreSQL I'm talking about PostgreSQL, the database server, which also may be referred to as a cluster or instance. Yes, PostgreSQL is server software. For production databases, Postgres is often install on a dedicated host (hardware or VM), though for development and testing it is possible to install PostgreSQL on your desktop or laptop computer.
When you use tools like
psql or PgAdmin to connect to
a PostgreSQL server, you are using client software designed to interface with
existing PostgreSQL servers (instances). This is the other half of the
used by PostgreSQL (and most other reputable databases).
See the PostgreSQL client tools section for more details.
A PostgreSQL cluster is what is commonly called a "PostgreSQL server." The database cluster handles connections, authentications, executing queries, and so on; it is the backbone for the remainder of functionality. From the PostgreSQL docs:
"A database cluster is a collection of databases that is managed by a single instance of a running database server."
Login users and groups (
ROLEs) are managed at the cluster level, though the
security of objects and data can be granted in much finer detail at the database, schema, table and even row level.
A database in PostgreSQL provides a logical grouping of objects within a PostgreSQL cluster. Each database contains its own tables, data and more. Connections to a PostgreSQL cluster will specify which database to use for the connection, determining what data is available.
One detail of note with PostgreSQL is that cross-database queries are not as simple as referencing a table in another
database. Some databases, such as MS SQL Server, allow cross-database queries by using
the fully qualified name including the database's name. The following SQL code references two databases,
SELECT * FROM db1.schema.table1 t1 INNER JOIN db2.schema.table2 t2 ON t1.id = t2.id
In PostgreSQL, cross-database queries are not possible without the use of an extension, such as foreign data wrappers (FDW).
Schema's in PostgreSQL
provide a way to separate database objects into functional groupings.
PostgreSQL databases have a schema named
public by default. In general, I prefer to avoid putting much
public schema, though I do put a few common database tables get there. When I use a
table, those go in the
One major benefit of schemas is they provide an easy to grant and/or deny access to database objects
(tables, views, functions, etc) to groups of users based on their functional role. An example of allowing
a users with the
sales_group to execute all functions in the
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA sales TO sales_group;
A table is a collection of structured data seen as columns and rows. The columns of a table define the structure of the data to be stored. The rows of a table are what most people would think of as "the data." If you work with data in spreadsheets, you've probably seen a decent representation of a table in PostgreSQL.
The following SQL code would create a table in the
sales schema named
location with two columns.
CREATE TABLE sales.location ( location_id SERIAL NOT NULL PRIMARY KEY, name TEXT NOT NULL );
SERIALdata type in PostgreSQL essentially creates an auto-incrementing
This code would insert a row for a location named 'Headquarters' into the
INSERT INTO sales.location (name) VALUES ('Headquarters');
PostgreSQL Client Tools
Learn how to use
psql is the most important tool to learn for working with PostgreSQL.
I know, it's a command line tool, and that can take some getting used to.
Even if you aren't guru of using the command line, this is worthy of your time investment to learn.
One of the mistakes I made early on was not learning
how to customize
to my liking.
psqlis the most important tool to learn for working with PostgreSQL.
There are times when
psql isn't ideal. A good example is when I'm working with
spatial data using PostGIS
and I need to visualize the results to help verify the results...
PgAdmin 4's geometry viewer rocks for that!
There are a plethora of GUI software options that work with PostgreSQL. Each of the GUI options for PostgreSQL has its own pros and cons to different users. A few options I have used are listed here.
PgAdmin 4 gets a lot of disdain from the community it seems because of its browser based interface. I think the only reason I have any complaint with that UI choice is my constant use of F5, which I use to execute queries against the database... but it also refreshes your browser and that can cause accidental data loss of your query!
Valentina Studio has both free and paid versions. I use the free version mostly for creating relationship diagrams of existing databases to include in internal documentation. OmniDb is the one I have the least experience with, not for any reason but I discovered it last.
psqlis my go-to choice when working with PostgreSQL.
PostgreSQL extensions are a big part of what makes PostgreSQL so awesome. Extensions provide a way to enable custom functionality in PostgreSQL very simply and cleanly. For example, the PostGIS extension allows any PostgreSQL database to turn into a powerful, GIS-enabled database.
CREATE EXTENSION postgis;
There are quite a few extensions included with PostgreSQL's contrib modules.
Backing up and restoring databases
Unless you don't care at all about your data, you must have a proper backup and restore plan.
The best I've found out there is pgBackRest. They have a thorough
user guide that walks you step-by-step through the process.
You can setup a dedicated backup server, schedule backups (
cron), and have a solid backup and restore system
in a few hours.
For other backup needs, there are these command line utilities that should be installed with PostgreSQL by default.
Most non-trivial database implementations should consider implementing streaming replication. This provides a relatively straight-forward way to keep one or more secondary servers available with an almost real-time copy of your databases. See the PostgreSQL Wiki for more options regarding replication, clustering and connection pooling.
Security and Permissions
PostgreSQL users login users and user groups, each called a
ROLE with specific attributes. It's generally
best to assign permissions to a user group for a specific function and assign users to that group to give
the appropriate access.
PostgreSQL uses the
pg_hba.conf file to define which connections are allowed to the server and the methods used
Read about these concepts
in more detail.
A neat feature with Postgres is you can easily see most, if not all, server settings when you
are connected to the database in a client tool, such as
psql. The command
SHOW <setting name>;
will return the currently set value. The following command shows how to view the
setting, commonly changes during performance tuning.
SHOW shared_buffers; ┌────────────────┐ │ shared_buffers │ ╞════════════════╡ │ 400MB │ └────────────────┘
Gotcha: Case sensitive by default
Users coming to PostgreSQL from other databases get caught by this quite often.
Your text in your database is case sensitive by default, meaning
Test <> test <> TEST.
<> means not equal)
This is really important with things like usernames and email addresses, but you can
easily get unexpected results with any text field.
SELECT 'Test' = 'Test' AS same, 'Test' = 'TEST' AS diff; ┌──────┬──────┐ │ same │ diff │ ╞══════╪══════╡ │ t │ f │ └──────┴──────┘
If you're using a
LIKE comparison you can
ILIKE instead. "The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale."
For other text, converting
can be a workaround.
PostgreSQL is an awesome databases, I really love both the product and the community. Hopefully this post has helped clarify some of the details about how Postgres works, and provided you with resources to each topic.
A few more links, just in case...
If you have questions, comments, or need help with your databases, contact us!
Published July 30, 2018
Last Updated January 09, 2019