Load and query Pi-hole data from Postgres
I have used Pi-hole on our local network for a
few years now. It is running on a dedicated Raspberry Pi 3B attached
to the router (Netgear Nighthawk) to provide fast local DNS/DCHP while
blocking ads at the network level.
The built-in Pi-hole web interface allows for some basic querying/reporting
of the collected data, but it's a bit limited and quite slow as the data
grows over time. My current pihole-FTL.db
database is 1.4 GB and contains 12 months of data.
$ ls -alh /etc/pihole/pihole-FTL.db
-rw-r--r-- 1 pihole pihole 1.4G Jan 31 14:04 pihole-FTL.db
Pi-hole saves its data in a few SQLite databases with the FTL database (Faster Than Light) being the most interesting. While I could try to work with the data directly in SQLite, I strongly prefer Postgres and decided this was a great time to give the pgspider/sqlite_fdw extension a try. This post goes over the steps I took to bring Pi-hole data into Postgres from its sqlite data source.
See my previous post on using
file_fdw
for more about Postgres' Foreign Data Wrappers.
Setup
Backup the SQLite database and move it off the Pi. I saved the backup file on my laptop
where I have Postgres 13 installed.
The path of the backup file on my local laptop is
/data/pihole/pihole-FTL--20210131.db
.
Next is to make the sqlite_fdw
extension available.
This extension is not part of the standard Postgres installation.
Install the sqlite_fdw
extension on the server
per the repo's instructions
(e.g. clone
, make
, sudo make install
).
Create a pihole
database and connect to it via psql
(or your favorite GUI,
e.g. DBeaver).
psql -c "CREATE DATABASE pihole;"
psql -d pihole
Create the extension that will let us query the SQLite database directly
from Postgres. Also create the FDW server named pihole_backup_server
pointing at the backup .db
file.
CREATE EXTENSION sqlite_fdw;
CREATE SERVER pihole_backup_server
FOREIGN DATA WRAPPER sqlite_fdw
OPTIONS (DATABASE '/data/pihole/pihole-FTL--20210131.db')
;
I use schemas to organize data instead of putting data in the default
public
schema.
CREATE SCHEMA pihole;
COMMENT ON SCHEMA pihole IS 'Data loaded from Pi-hole.';
Foreign Table
The data I am most interested in is the table named queries
.
Lucky for me,
the Pi-hole docs do a good job
describing the table structure.
The other way to see the table structure is to use sqlite3
and .schema query
to get the code.
The foreign table is created to match the structure in SQLite. Make sure to use the same column names and data types to avoid errors.
CREATE FOREIGN TABLE pihole.query_fdw
(
id BIGINT NOT NULL,
timestamp BIGINT NOT NULL,
type INT NOT NULL,
status INT NOT NULL,
domain TEXT NOT NULL,
client TEXT NOT NULL,
forward TEXT NULL,
additional_info TEXT NULL
)
SERVER pihole_backup_server
OPTIONS (TABLE 'queries')
;
COMMENT ON FOREIGN TABLE pihole.query_fdw IS 'Linked to Pi-hole query table of local backup';
With the pihole.query_fdw
foreign table created, attempt a simple query.
This is helpful to ensure the foreign table was created successfully
and to start getting an
idea of what the data looks like.
SELECT * FROM pihole.query_fdw LIMIT 1;
┌──────────┬────────────┬──────┬────────┬────────────┬──────────────┬─────────┬─────────────────┐
│ id │ timestamp │ type │ status │ domain │ client │ forward │ additional_info │
╞══════════╪════════════╪══════╪════════╪════════════╪══════════════╪═════════╪═════════════════╡
│ 19493268 │ 1580569203 │ 1 │ 3 │ tag100.lan │ 172.16.0.240 │ ¤ │ ¤ │
└──────────┴────────────┴──────┴────────┴────────────┴──────────────┴─────────┴─────────────────┘
Exploring the data
The Pi-hole documentation told us the timestamp
column is a Unix timestamp.
We can convert this epoch to a Postgres timestamptz
value with the
to_timestamp()
function. The Pi generating this data has its timezone set to
America/Denver
. Informing Postgres of the timezone is important to ensure
data lines up properly with expectations.
SELECT TO_TIMESTAMP(timestamp) AT TIME ZONE 'America/Denver' AS ts
FROM pihole.query_fdw
WHERE id = 19493268;
┌────────────────────────┐
│ ts │
╞════════════════════════╡
│ 2020-02-01 15:00:03+00 │
└────────────────────────┘
Another query to look at the row count and range of dates included. At this point
it's important to validate that the max_ts
lines up with when the backup
was taken. If there was a problem setting the timezone properly it should
be apparent here.
SELECT COUNT(*),
MIN(to_timestamp(timestamp))
AT TIME ZONE 'America/Denver' AS min_ts,
MAX(to_timestamp(timestamp))
AT TIME ZONE 'America/Denver' AS max_ts
FROM pihole.query_fdw
;
┌──────────┬─────────────────────┬─────────────────────┐
│ count │ min_ts │ max_ts │
╞══════════╪═════════════════════╪═════════════════════╡
│ 19386278 │ 2020-02-01 08:00:03 │ 2021-01-31 08:57:06 │
└──────────┴─────────────────────┴─────────────────────┘
Load into Postgres
With more than 19 million rows in the foreign table, performance will not
be as good as I want. Relying on the FDW would also require keeping the SQLite
database file around instead of just relying on my Postgres backups.
Create a normal table for long-term storage and bring the data from
the foreign table. This permanent table adds a PRIMARY KEY
to the id
column and renames some other columns.
timestamp
tots
type
toquery_type_id
status
tostatus_id
The datatype of the ts
column is TIMESTAMPTZ
, the data will be converted
from the Unix timestamp in the insert statement.
CREATE TABLE pihole.query
(
id BIGINT NOT NULL PRIMARY KEY,
ts TIMESTAMPTZ NOT NULL,
query_type_id INT NOT NULL,
status_id INT NOT NULL,
domain TEXT NOT NULL,
client TEXT NOT NULL,
forward TEXT NULL,
additional_info TEXT NULL
);
Load data over. This takes about 45 seconds on my laptop.
INSERT INTO pihole.query
(id, ts, query_type_id, status_id, domain, client, forward, additional_info)
SELECT id, TO_TIMESTAMP(timestamp) AS ts, type AS query_type_id,
status AS status_id, domain, client, forward, additional_info
FROM pihole.query_fdw
;
The status_id
and query_type_id
columns will be used frequently in both
filters and joins. Create indexes on these columns to help improve performance.
CREATE INDEX ix_pihole_query_status_id
ON pihole.query (status_id);
CREATE INDEX ix_pihole_query_query_type_id
ON pihole.query (query_type_id);
The query_type_id
and status_id
columns have integer codes
defined in the docs for
query types
and
status types.
I create tables and INSERT
statements to store these lookup values.
pihole.query_type
CREATE TABLE pihole.query_type
(
id BIGINT NOT NULL PRIMARY KEY,
query_type TEXT NOT NULL,
CONSTRAINT uq_pihole_query_type UNIQUE (query_type)
);
COMMENT ON TABLE pihole.query_type IS 'Describes the type of DNS query.'
INSERT INTO pihole.query_type (id, query_type)
VALUES (1, 'A'), (2, 'AAAA'), (3, 'ANY'),
(4, 'SRV'), (5, 'SOA'), (6, 'PTR'),
(7, 'TXT'), (8, 'NAPTR'), (9, 'MX'),
(10, 'DS'), (11, 'RRSIG'), (12, 'DNSKEY'),
(13, 'Other'), (16, 'Unknown')
;
pihole.status
CREATE TABLE pihole.status
(
id INT NOT NULL PRIMARY KEY,
status TEXT NOT NULL,
allowed BOOLEAN NOT NULL,
description TEXT NOT NULL
);
COMMENT ON TABLE pihole.status IS 'Describes the status of the DNS query.'
INSERT INTO pihole.status (id, status, allowed, description)
VALUES (1, 'Unknown', False, 'was not answered by forward destination'),
(2, 'Blocked', False, 'Domain contained in gravity database'),
(3, 'Allowed', True, 'Forwarded'),
(4, 'Allowed', True, 'Known, replied to from cache'),
(5, 'Blocked', False, 'Domain contained in exact blacklist'),
(6, 'Blocked', False, 'By upstream server (known blocking page IP address)'),
(7, 'Blocked', False, 'By upstream server (0.0.0.0 or ::)'),
(8, 'Blocked', False, 'By upstream server (NXDOMAIN with RA bit unset)'),
(9, 'Blocked', False, 'Domain contained in gravity database. Blocked during deep CNAME inspection'),
(10, 'Blocked', False, 'Domain matched by a regex blacklist filter. Blocked during deep CNAME inspection'),
(11, 'Blocked', False, 'Domain contained in exact blacklist. Blocked during deep CNAME inspection')
;
Now a query to put these lookup tables to work. Using the descriptors instead of their ID values makes it easier to see and understand the data.
SELECT q.id, q.ts, q.query_type_id, qt.query_type, q.status_id, s.status,
s.allowed, q.domain, q.client, q.forward
FROM pihole.query q
INNER JOIN pihole.query_type qt ON q.query_type_id = qt.id
INNER JOIN pihole.status s ON q.status_id = s.id
WHERE q.id = 19493268
;
Results using psql's extended display mode (\x
) show this query was for the
A
record and was Allowed.
┌─[ RECORD 1 ]──┬────────────────────────┐
│ id │ 19493268 │
│ ts │ 2020-02-01 15:00:03+00 │
│ query_type_id │ 1 │
│ query_type │ A │
│ status_id │ 3 │
│ status │ Allowed │
│ allowed │ t │
│ domain │ tag100.lan │
│ client │ 172.16.0.240 │
│ forward │ ¤ │
└───────────────┴────────────────────────┘
Next steps
The PiHole-FTL database has additional tables I am interested in bringing in to Postgres.
network
network_addresses
A quick peek at the structure of these tables in sqlite
shows
some interesting details.
For these tables I will repeat the steps of creating the foreign table
and then creating a normal Postgres table with updated column names.
For example, in the network
table the firstSeen
and lastSeen
columns
will become first_seen
and last_seen
.
sqlite> .schema network
CREATE TABLE IF NOT EXISTS "network" ( id INTEGER PRIMARY KEY NOT NULL, hwaddr TEXT UNIQUE NOT NULL, interface TEXT NOT NULL, firstSeen INTEGER NOT NULL, lastQuery INTEGER NOT NULL, numQueries INTEGER NOT NULL, macVendor TEXT, aliasclient_id INTEGER);
sqlite> .schema network_addresses
CREATE TABLE IF NOT EXISTS "network_addresses" ( network_id INTEGER NOT NULL, ip TEXT UNIQUE NOT NULL, lastSeen INTEGER NOT NULL DEFAULT (cast(strftime('%s', 'now') as int)), name TEXT, nameUpdated INTEGER, FOREIGN KEY(network_id) REFERENCES network(id));
This post is part of the series PostgreSQL: From Idea to Database.
Summary
Once again Postgres' foreign data wrappers have proven to be a fantastic tool!
Creating the sqlite_fdw
extension was easy and connecting Postgres to the
Pi-hole's FTL database went smoothly. My favorite use for FDWs is for an
easy and efficient way to load data into Postgres from a variety of data
sources. I typically limit the use of FDWs to temporary use as part of a larger data pipeline.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!