Audit Data with Triggers: PGSQL Phriday #007
Welcome to another #PGSQLPhriday post! This month's host is Lætitia Avrot, who picked the topic of Triggers with these questions:
"Do you love them? Do you hate them? Do you sometimes love them sometimes hate them? And, most importantly, why? Do you know legitimate use cases for them? How to mitigate their drawbacks (if you think they have any)?"
Let's dive in!
Triggers are a specialized tool
I rarely use triggers. I don't hate triggers, I just think they should be used
sparingly. Like any specialized tool, you should not expect to use triggers for every
occasion where they could be used.
However... there is one
notable use where case I really like triggers: audit tables.
Part of the magic of using triggers for auditing data changes in Postgres is
the JSON
/JSONB
support available.
Reasons to audit
There are three main reasons I implement auditing in databases.
- Compliance
- Troubleshooting
- Oops protection
Each of these scenarios benefits from having a clear trail of data modifications over time. They also share a common reality that once you start tracking history in one place, you, or the business, will realize that you now need the same functionality for a handful of other tables. This means the solution needs to be easy to scale. It also needs to be easy to clean up and disable if it becomes unnecessary in the future.
Umair Shahid's post for this #PGSQLPhriday provides a great overview of pros, cons, and things to watch out for with triggers.
Auditing with triggers
Database triggers provide action logic when data modifying actions (e.g. INSERT
, UPDATE
) are ran against the
defined table.
Many databases I work with use a generic audit
table that tracks where the change
happened (schema, table) and what the row data was in that version.
Combine this with Postgres' JSONB
support, and it's easy to build a generic
auditing system that is easy to plug into a new database, and once included
it is trivial to add or remove auditing on other tables.
Tables to track
For this example, we'll create a widget
and widget_sales
tables.
CREATE TABLE widget (
id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT NOT NULL,
description TEXT NOT NULL,
price_usd NUMERIC NOT NULL
);
CREATE TABLE widget_sales (
id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
widget_id BIGINT NOT NULL REFERENCES widget (id),
unit_count BIGINT NOT NULL,
total_price_usd NUMERIC,
comments TEXT
);
Audit table
To audit the data in our tables we need a place to store the information.
The generic audit table I use is created in the following example. The first two columns
are a simple primary key and a timestamp to track when each action happens.
The t_name
and s_name
columns track the source of the data, the action
column tracks INSERT
/UPDATE
/DELETE
/etc. The actual data from the table
being audited will get stuffed into the row_data JSONB
column.
CREATE TABLE audit (
id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
audit_time timestamptz NOT NULL DEFAULT now(),
t_name text NOT NULL,
s_name text NOT NULL,
"action" text NOT NULL,
row_data JSONB NOT NULL,
CONSTRAINT pk_audit_id PRIMARY KEY (ID)
);
The audit
table intentionally does not have additional indexes on it.
The reason is every index on a table slows down the INSERT
to the table, and we
do not want this audit
to be slow on that operation. This decision potentially makes
querying the audit
table slower to query later, but the intent of this table is not regular
querying. It's for occasional lookups.
Audit function
The following CREATE FUNCTION
block creates a generic function
to track INSERT
and UPDATE
statements. The audit_upsert()
function
shown here does not track deleted records.
Using row_to_json(NEW)
allows ignoring specifics of more nuanced data structures
as an easy compromise. This detail means some specialized data types might
not be handled well.
CREATE FUNCTION audit_upsert()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE record JSON;
BEGIN
record := row_to_json(NEW);
INSERT INTO audit (t_name, s_name, action, row_data)
SELECT TG_TABLE_NAME, TG_TABLE_SCHEMA, TG_OP, record;
RETURN NULL;
END;
$function$
;
The audit_upsert
function can now be used with CREATE TRIGGER
to track data changes to the specified table.
-- Add insert/update tracking to widget
CREATE TRIGGER audit_upsert AFTER INSERT OR UPDATE
ON widget
FOR EACH ROW EXECUTE FUNCTION audit_upsert()
;
-- Add insert/update tracking to widget_sales
CREATE TRIGGER audit_upsert AFTER INSERT OR UPDATE
ON widget_sales
FOR EACH ROW EXECUTE FUNCTION audit_upsert()
;
Add records
The above examples created a couple basic tables and a the objects needed
for auditing. The following INSERT
query adds two records to the widget
table to test the audit triggers.
INSERT INTO widget (name, description, price_usd)
VALUES ('this-thing', 'It is this.', 9.99),
('that-thing', 'This is that.', 4.99)
;
The next example adds some data to the widget_sales
table to record sales of 7 of
this-thing
and 42 of that-thing
.
INSERT INTO widget_sales (widget_id, unit_count, total_price_usd)
SELECT id AS widget_id, 7, 7 * price_usd AS total_price_usd
FROM widget
WHERE name = 'this-thing'
UNION
SELECT id AS widget_id, 42, 42 * price_usd AS total_price_usd
FROM widget
WHERE name = 'that-thing'
;
Make a change to the data in the widget
table to increase the price and
update the description.
UPDATE widget
SET price_usd = price_usd + (price_usd * .1),
description = description || ' Price increase for extra awesomeness.'
;
Query the audit table
Query the audit
table to see the details tracked. The results below show
only the first 2 records inserted
SELECT id, audit_time, t_name, action, jsonb_pretty(row_data)
FROM audit
ORDER BY id
LIMIT 1
;
┌────┬───────────────────────────────┬────────┬────────┬────────────────────────────────────┐
│ id │ audit_time │ t_name │ action │ jsonb_pretty │
╞════╪═══════════════════════════════╪════════╪════════╪════════════════════════════════════╡
│ 1 │ 2023-04-07 02:19:20.378905+00 │ widget │ INSERT │ { ↵│
│ │ │ │ │ "id": 1, ↵│
│ │ │ │ │ "name": "this-thing", ↵│
│ │ │ │ │ "price_usd": 9.99, ↵│
│ │ │ │ │ "description": "It is this." ↵│
│ │ │ │ │ } │
│ 2 │ 2023-04-07 02:19:20.378905+00 │ widget │ INSERT │ { ↵│
│ │ │ │ │ "id": 2, ↵│
│ │ │ │ │ "name": "that-thing", ↵│
│ │ │ │ │ "price_usd": 4.99, ↵│
│ │ │ │ │ "description": "This is that."↵│
│ │ │ │ │ } │
└────┴───────────────────────────────┴────────┴────────┴────────────────────────────────────┘
The next query searches for all versions of the row in the widget
table
with id = 1
. Data in JSONB
loses its type information so the id
is casted back
to BIGINT
.
The results show the original
row with action = 'INSERT'
and the updated record with the new price and description.
SELECT s_name, t_name, action, row_data
FROM audit
WHERE t_name= 'widget'
AND (row_data->>'id')::BIGINT = 1
;
┌────────┬────────┬────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ s_name │ t_name │ action │ row_data │
╞════════╪════════╪════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ public │ widget │ INSERT │ {"id": 1, "name": "this-thing", "price_usd": 9.99, "description": "It is this."} │
│ public │ widget │ UPDATE │ {"id": 1, "name": "this-thing", "price_usd": 10.989, "description": "It is this. Price increase for extra a…│
│ │ │ │…wesomeness"} │
└────────┴────────┴────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Tracking changes
The following query shows how you could track changes to the price of widget
s
over time using the LAG()
window function.
WITH x AS (
SELECT row_data->>'id' AS id, row_data->>'price_usd' AS cur_price_usd,
-- Calculate price difference from previous row version
(row_data->>'price_usd')::NUMERIC
- LAG((row_data->>'price_usd')::NUMERIC) OVER (
PARTITION BY row_data->>'id'
ORDER BY audit_time
) AS price_diff,
-- How long between this version on previous row version?
audit_time - LAG(audit_time) OVER (
PARTITION BY row_data->>'id'
ORDER BY audit_time
) AS duration_since_prior,
-- Used to later filter down to only the latest version
ROW_NUMBER() OVER (
PARTITION BY row_data->>'id'
ORDER BY audit_time DESC
) AS rnk
FROM audit
WHERE t_name = 'widget'
)
SELECT id, new_price_usd, price_diff, duration_since_prior
FROM x
WHERE rnk = 1
;
The results from the above query are below. This shows how you can easily query across versions of a specific record using window functions.
┌────┬───────────────┬────────────┬──────────────────────┐
│ id │ cur_price_usd │ price_diff │ duration_since_prior │
╞════╪═══════════════╪════════════╪══════════════════════╡
│ 1 │ 10.989 │ 0.999 │ @ 52.897558 secs │
│ 2 │ 5.489 │ 0.499 │ @ 52.897558 secs │
└────┴───────────────┴────────────┴──────────────────────┘
Remove auditing
After going through the above examples it has become apparent that
the auditing on widget_sales
was not necessary. Those records follow
an INSERT
-only pattern, so auditing does not add much benefit.
The auditing of this table can be removed by remove the trigger from it.
DROP TRIGGER audit_upsert ON widget_sales;
Summary
The approach of auditing shown here is a good use of database triggers. You must be careful with triggers like this, using them everywhere on large, active databases would likely have a significant negative impact on your overall database performance. You also don't want to keep data like this forever unless you must. Even if you must keep it for a while, it should have an expiration defined in policy.
Thanks again, Lætitia, for a great topic! I look forward to reading the other contributions this month.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!