RustProof Labs: blogging for education (logo)

PostgreSQL Crash Course

By Ryan Lambert -- Published July 30, 2018

PostgreSQL (postgres, pg) is a powerful, open-source relational database (RMDS). It is the database of choice here at RustProof Labs, I wrote previously why I chose PostgreSQL over MySQL. This post provides a 10-mile view of how PostgreSQL is structured, tools to use, and other important concepts. The goal is to provide a broad view of PostgreSQL with resources to more detail on each topic, without getting caught up in too much technical jargon.

This post is part of the series PostgreSQL: From Idea to Database.

Installing PostgreSQL

PostgreSQL will run on most any modern hardware and operating systems. PostgreSQL has installation guides for a large number of operating systems, supporting most Linux/Unix, Mac, and Windows. As for hardware, it will run on nearly anything modern, including a Raspberry Pi (ARM processor). See the PostgreSQL supported platforms for more details.

I have successfully ran PostgreSQL on Linux servers of all sizes, from tiny to powerful. I previously wrote about load testing PostgreSQL that gives more detail about some of the systems it can run on and what type of performance to expect.

Structure and Terminology

Every RDMS has its own way to organize the elements required for a reliable database system. PostgreSQL has the following hierarchy to organize the required elements.

Cluster > Database > Schema > Table > Data

Data is stored as rows, contained in tables, within a schema, within a database, within a cluster. The PostgreSQL Concepts page provides a nice explanation of this overall structure.

Architecture

When I talk about PostgreSQL I'm talking about PostgreSQL, the database server, which also may be referred to as a cluster or instance. Yes, PostgreSQL is server software. For production databases, Postgres is often install on a dedicated host (hardware or VM), though for development and testing it is possible to install PostgreSQL on your desktop or laptop computer.

When you use tools like psql or PgAdmin to connect to a PostgreSQL server, you are using client software designed to interface with existing PostgreSQL servers (instances). This is the other half of the client/server model used by PostgreSQL (and most other reputable databases).

See the PostgreSQL client tools section for more details.

Cluster

A PostgreSQL cluster is what is commonly called a "PostgreSQL server." The database cluster handles connections, authentications, executing queries, and so on; it is the backbone for the remainder of functionality. From the PostgreSQL docs:

"A database cluster is a collection of databases that is managed by a single instance of a running database server."

Login users and groups (ROLEs) are managed at the cluster level, though the security of objects and data can be granted in much finer detail at the database, schema, table and even row level.

Database

A database in PostgreSQL provides a logical grouping of objects within a PostgreSQL cluster. Each database contains its own tables, data and more. Connections to a PostgreSQL cluster will specify which database to use for the connection, determining what data is available.

One detail of note with PostgreSQL is that cross-database queries are not as simple as referencing a table in another database. Some databases, such as MS SQL Server, allow cross-database queries by using the fully qualified name including the database's name. The following SQL code references two databases, db1 and db2.

SELECT *
    FROM db1.schema.table1 t1
    INNER JOIN db2.schema.table2 t2 ON t1.id = t2.id

In PostgreSQL, cross-database queries are not possible without the use of an extension, such as foreign data wrappers (FDW).

Schema

Schema's in PostgreSQL provide a way to separate database objects into functional groupings. PostgreSQL databases have a schema named public by default. In general, I prefer to avoid putting much in the public schema, though I do put a few common database tables get there. When I use a calendar or time table, those go in the public schema.

One major benefit of schemas is they provide an easy to grant and/or deny access to database objects (tables, views, functions, etc) to groups of users based on their functional role. An example of allowing a users with the sales_group to execute all functions in the sales schema.

GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA sales TO sales_group;

Table

A table is a collection of structured data seen as columns and rows. The columns of a table define the structure of the data to be stored. The rows of a table are what most people would think of as "the data." If you work with data in spreadsheets, you've probably seen a decent representation of a table in PostgreSQL.

The following SQL code would create a table in the sales schema named location with two columns.

CREATE TABLE sales.location
(
    location_id SERIAL NOT NULL PRIMARY KEY,
    name TEXT NOT NULL
);

The SERIAL data type in PostgreSQL essentially creates an auto-incrementing INTEGER.

This code would insert a row for a location named 'Headquarters' into the sales.location table.

INSERT INTO sales.location (name) VALUES ('Headquarters');

PostgreSQL Client Tools

Learn how to use psql. Hands down, psql is the most important tool to learn for working with PostgreSQL. I know, it's a command line tool, and that can take some getting used to. Even if you aren't guru of using the command line, this is worthy of your time investment to learn. One of the mistakes I made early on was not learning how to customize psql to my liking.

Hands down, psql is the most important tool to learn for working with PostgreSQL.

GUI Tools

There are times when psql isn't ideal. A good example is when I'm working with spatial data using PostGIS and I need to visualize the results to help verify the results... PgAdmin 4's geometry viewer rocks for that!

There are a plethora of GUI software options that work with PostgreSQL. Each of the GUI options for PostgreSQL has its own pros and cons to different users. A few options I have used are listed here.

PgAdmin 4 gets a lot of disdain from the community it seems because of its browser based interface. I think the only reason I have any complaint with that UI choice is my constant use of F5, which I use to execute queries against the database... but it also refreshes your browser and that can cause accidental data loss of your query!

Valentina Studio has both free and paid versions. I use the free version mostly for creating relationship diagrams of existing databases to include in internal documentation. OmniDb is the one I have the least experience with, not for any reason but I discovered it last.

psql is my go-to choice when working with PostgreSQL.

Extensions

PostgreSQL extensions are a big part of what makes PostgreSQL so awesome. Extensions provide a way to enable custom functionality in PostgreSQL very simply and cleanly. For example, the PostGIS extension allows any PostgreSQL database to turn into a powerful, GIS-enabled database.

CREATE EXTENSION postgis;

There are quite a few extensions included with PostgreSQL's contrib modules.

Backing up and restoring databases

Unless you don't care at all about your data, you must have a proper backup and restore plan. The best I've found out there is pgBackRest. They have a thorough user guide that walks you step-by-step through the process. You can setup a dedicated backup server, schedule backups (cron), and have a solid backup and restore system in a few hours.

For other backup needs, there are these command line utilities that should be installed with PostgreSQL by default.

Replication

Most non-trivial database implementations should consider implementing streaming replication. This provides a relatively straight-forward way to keep one or more secondary servers available with an almost real-time copy of your databases. See the PostgreSQL Wiki for more options regarding replication, clustering and connection pooling.

Security and Permissions

PostgreSQL users login users and user groups, each called a ROLE with specific attributes. It's generally best to assign permissions to a user group for a specific function and assign users to that group to give the appropriate access. PostgreSQL uses the pg_hba.conf file to define which connections are allowed to the server and the methods used for authentication. Read about these concepts in more detail.

Viewing Settings

A neat feature with Postgres is you can easily see most, if not all, server settings when you are connected to the database in a client tool, such as psql. The command SHOW <setting name>; will return the currently set value. The following command shows how to view the shared_buffers setting, commonly changes during performance tuning.

SHOW shared_buffers;
┌────────────────┐
│ shared_buffers │
╞════════════════╡
│ 400MB          │
└────────────────┘

Gotcha: Case sensitive by default

Users coming to PostgreSQL from other databases get caught by this quite often. Your text in your database is case sensitive by default, meaning Test <> test <> TEST. (<> means not equal) This is really important with things like usernames and email addresses, but you can easily get unexpected results with any text field.

SELECT 'Test' = 'Test' AS same, 'Test' = 'TEST' AS diff;
┌──────┬──────┐
│ same │ diff │
╞══════╪══════╡
│ t    │ f    │
└──────┴──────┘

If you're using a LIKE comparison you can use ILIKE instead. "The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale."

For other text, converting data to LOWER() can be a workaround.

Additional Resources

PostgreSQL is an awesome databases, I really love both the product and the community. Hopefully this post has helped clarify some of the details about how Postgres works, and provided you with resources to each topic.

A few more links, just in case...

If you have questions, comments, or need help with your databases, contact us!

By Ryan Lambert
Published July 30, 2018
Last Updated January 09, 2019