My book Mastering PostGIS and OpenStreetMap is available!

Audit Data with Triggers: PGSQL Phriday #007

By Ryan Lambert -- Published April 07, 2023

Welcome to another #PGSQLPhriday post! This month's host is Lætitia Avrot, who picked the topic of Triggers with these questions:

"Do you love them? Do you hate them? Do you sometimes love them sometimes hate them? And, most importantly, why? Do you know legitimate use cases for them? How to mitigate their drawbacks (if you think they have any)?"

Let's dive in!

Triggers are a specialized tool

I rarely use triggers. I don't hate triggers, I just think they should be used sparingly. Like any specialized tool, you should not expect to use triggers for every occasion where they could be used. However... there is one notable use where case I really like triggers: audit tables. Part of the magic of using triggers for auditing data changes in Postgres is the JSON/JSONB support available.

Reasons to audit

There are three main reasons I implement auditing in databases.

Compliance
Troubleshooting
Oops protection

Each of these scenarios benefits from having a clear trail of data modifications over time. They also share a common reality that once you start tracking history in one place, you, or the business, will realize that you now need the same functionality for a handful of other tables. This means the solution needs to be easy to scale. It also needs to be easy to clean up and disable if it becomes unnecessary in the future.

Umair Shahid's post for this #PGSQLPhriday provides a great overview of pros, cons, and things to watch out for with triggers.

Auditing with triggers

Database triggers provide action logic when data modifying actions (e.g. INSERT, UPDATE) are ran against the defined table. Many databases I work with use a generic audit table that tracks where the change happened (schema, table) and what the row data was in that version. Combine this with Postgres' JSONB support, and it's easy to build a generic auditing system that is easy to plug into a new database, and once included it is trivial to add or remove auditing on other tables.

Tables to track

For this example, we'll create a widget and widget_sales tables.

CREATE TABLE widget (
    id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT NOT NULL,
    price_usd NUMERIC NOT NULL
);

CREATE TABLE widget_sales (
    id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    widget_id BIGINT NOT NULL REFERENCES widget (id),
    unit_count BIGINT NOT NULL,
    total_price_usd NUMERIC,
    comments TEXT
);

Audit table

To audit the data in our tables we need a place to store the information. The generic audit table I use is created in the following example. The first two columns are a simple primary key and a timestamp to track when each action happens. The t_name and s_name columns track the source of the data, the action column tracks INSERT/UPDATE/DELETE/etc. The actual data from the table being audited will get stuffed into the row_data JSONB column.

CREATE TABLE audit (
    id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
    audit_time timestamptz NOT NULL DEFAULT now(),
    t_name text NOT NULL,
    s_name text NOT NULL,
    "action" text NOT NULL,
    row_data JSONB NOT NULL,
    CONSTRAINT pk_audit_id PRIMARY KEY (ID)
);

The audit table intentionally does not have additional indexes on it. The reason is every index on a table slows down the INSERT to the table, and we do not want this audit to be slow on that operation. This decision potentially makes querying the audit table slower to query later, but the intent of this table is not regular querying. It's for occasional lookups.

Audit function

The following CREATE FUNCTION block creates a generic function to track INSERT and UPDATE statements. The audit_upsert() function shown here does not track deleted records. Using row_to_json(NEW) allows ignoring specifics of more nuanced data structures as an easy compromise. This detail means some specialized data types might not be handled well.

CREATE FUNCTION audit_upsert()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
    DECLARE record JSON;

    BEGIN
        record := row_to_json(NEW);
        INSERT INTO audit (t_name, s_name, action, row_data)
            SELECT TG_TABLE_NAME, TG_TABLE_SCHEMA, TG_OP, record;
            RETURN NULL;
    END;
    $function$
;

The audit_upsert function can now be used with CREATE TRIGGER to track data changes to the specified table.

-- Add insert/update tracking to widget
CREATE TRIGGER audit_upsert AFTER INSERT OR UPDATE
    ON widget
    FOR EACH ROW EXECUTE FUNCTION audit_upsert()
;
-- Add insert/update tracking to widget_sales
CREATE TRIGGER audit_upsert AFTER INSERT OR UPDATE
    ON widget_sales
    FOR EACH ROW EXECUTE FUNCTION audit_upsert()
;

Add records

The above examples created a couple basic tables and a the objects needed for auditing. The following INSERT query adds two records to the widget table to test the audit triggers.

INSERT INTO widget (name, description, price_usd)
    VALUES ('this-thing', 'It is this.', 9.99),
        ('that-thing', 'This is that.', 4.99)
;

The next example adds some data to the widget_sales table to record sales of 7 of this-thing and 42 of that-thing.

INSERT INTO widget_sales (widget_id, unit_count, total_price_usd)
    SELECT id AS widget_id, 7, 7 * price_usd AS total_price_usd
        FROM widget
        WHERE name = 'this-thing'
    UNION
    SELECT id AS widget_id, 42, 42 * price_usd AS total_price_usd
        FROM widget
        WHERE name = 'that-thing'
    ;

Make a change to the data in the widget table to increase the price and update the description.

UPDATE widget
    SET price_usd = price_usd + (price_usd * .1),
        description = description || ' Price increase for extra awesomeness.'
;

Query the audit table

Query the audit table to see the details tracked. The results below show only the first 2 records inserted

SELECT id, audit_time, t_name, action, jsonb_pretty(row_data)
    FROM audit
    ORDER BY id
    LIMIT 1
;

┌────┬───────────────────────────────┬────────┬────────┬────────────────────────────────────┐
│ id │          audit_time           │ t_name │ action │            jsonb_pretty            │
╞════╪═══════════════════════════════╪════════╪════════╪════════════════════════════════════╡
│  1 │ 2023-04-07 02:19:20.378905+00 │ widget │ INSERT │ {                                 ↵│
│    │                               │        │        │     "id": 1,                      ↵│
│    │                               │        │        │     "name": "this-thing",         ↵│
│    │                               │        │        │     "price_usd": 9.99,            ↵│
│    │                               │        │        │     "description": "It is this."  ↵│
│    │                               │        │        │ }                                  │
│  2 │ 2023-04-07 02:19:20.378905+00 │ widget │ INSERT │ {                                 ↵│
│    │                               │        │        │     "id": 2,                      ↵│
│    │                               │        │        │     "name": "that-thing",         ↵│
│    │                               │        │        │     "price_usd": 4.99,            ↵│
│    │                               │        │        │     "description": "This is that."↵│
│    │                               │        │        │ }                                  │
└────┴───────────────────────────────┴────────┴────────┴────────────────────────────────────┘

The next query searches for all versions of the row in the widget table with id = 1. Data in JSONB loses its type information so the id is casted back to BIGINT. The results show the original row with action = 'INSERT' and the updated record with the new price and description.

SELECT s_name, t_name, action, row_data
    FROM audit
    WHERE t_name= 'widget'
        AND (row_data->>'id')::BIGINT = 1
;

┌────────┬────────┬────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ s_name │ t_name │ action │                                                  row_data                                                   │
╞════════╪════════╪════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ public │ widget │ INSERT │ {"id": 1, "name": "this-thing", "price_usd": 9.99, "description": "It is this."}                            │
│ public │ widget │ UPDATE │ {"id": 1, "name": "this-thing", "price_usd": 10.989, "description": "It is this. Price increase for extra a…│
│        │        │        │…wesomeness"}                                                                                                │
└────────┴────────┴────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Tracking changes

The following query shows how you could track changes to the price of widgets over time using the LAG() window function.

WITH x AS (
SELECT row_data->>'id' AS id, row_data->>'price_usd' AS cur_price_usd,
        -- Calculate price difference from previous row version
        (row_data->>'price_usd')::NUMERIC 
            - LAG((row_data->>'price_usd')::NUMERIC) OVER (
                PARTITION BY row_data->>'id'
                ORDER BY audit_time
            ) AS price_diff,
        -- How long between this version on previous row version?
        audit_time - LAG(audit_time) OVER (
            PARTITION BY row_data->>'id'
            ORDER BY audit_time
        ) AS duration_since_prior,
        -- Used to later filter down to only the latest version
        ROW_NUMBER() OVER (
            PARTITION BY row_data->>'id'
            ORDER BY audit_time DESC
            ) AS rnk
    FROM audit
    WHERE t_name = 'widget'
)
SELECT id, new_price_usd, price_diff, duration_since_prior
    FROM x
    WHERE rnk = 1
;

The results from the above query are below. This shows how you can easily query across versions of a specific record using window functions.

┌────┬───────────────┬────────────┬──────────────────────┐
│ id │ cur_price_usd │ price_diff │ duration_since_prior │
╞════╪═══════════════╪════════════╪══════════════════════╡
│ 1  │ 10.989        │      0.999 │ @ 52.897558 secs     │
│ 2  │ 5.489         │      0.499 │ @ 52.897558 secs     │
└────┴───────────────┴────────────┴──────────────────────┘

Remove auditing

After going through the above examples it has become apparent that the auditing on widget_sales was not necessary. Those records follow an INSERT-only pattern, so auditing does not add much benefit. The auditing of this table can be removed by remove the trigger from it.

DROP TRIGGER audit_upsert ON widget_sales;

Summary

The approach of auditing shown here is a good use of database triggers. You must be careful with triggers like this, using them everywhere on large, active databases would likely have a significant negative impact on your overall database performance. You also don't want to keep data like this forever unless you must. Even if you must keep it for a while, it should have an expiration defined in policy.

Thanks again, Lætitia, for a great topic! I look forward to reading the other contributions this month.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published April 07, 2023
Last Updated April 07, 2023