RustProof Labs: blogging for education (logo)

Load and query Pi-hole data from Postgres

By Ryan Lambert -- Published February 01, 2021

I have used Pi-hole on our local network for a few years now. It is running on a dedicated Raspberry Pi 3B attached to the router (Netgear Nighthawk) to provide fast local DNS/DCHP while blocking ads at the network level. The built-in Pi-hole web interface allows for some basic querying/reporting of the collected data, but it's a bit limited and quite slow as the data grows over time. My current pihole-FTL.db database is 1.4 GB and contains 12 months of data.

$ ls -alh /etc/pihole/pihole-FTL.db
-rw-r--r--  1 pihole pihole 1.4G Jan 31 14:04 pihole-FTL.db

Pi-hole saves its data in a few SQLite databases with the FTL database (Faster Than Light) being the most interesting. While I could try to work with the data directly in SQLite, I strongly prefer Postgres and decided this was a great time to give the pgspider/sqlite_fdw extension a try. This post goes over the steps I took to bring Pi-hole data into Postgres from its sqlite data source.

See my previous post on using file_fdw for more about Postgres' Foreign Data Wrappers.

Setup

Backup the SQLite database and move it off the Pi. I saved the backup file on my laptop where I have Postgres 13 installed. The path of the backup file on my local laptop is /data/pihole/pihole-FTL--20210131.db. Next is to make the sqlite_fdw extension available. This extension is not part of the standard Postgres installation. Install the sqlite_fdw extension on the server per the repo's instructions (e.g. clone, make, sudo make install).

Create a pihole database and connect to it via psql (or your favorite GUI, e.g. DBeaver).

psql -c "CREATE DATABASE pihole;"
psql -d pihole

Create the extension that will let us query the SQLite database directly from Postgres. Also create the FDW server named pihole_backup_server pointing at the backup .db file.

CREATE EXTENSION sqlite_fdw;
CREATE SERVER pihole_backup_server
    FOREIGN DATA WRAPPER sqlite_fdw
    OPTIONS (DATABASE '/data/pihole/pihole-FTL--20210131.db')
;

I use schemas to organize data instead of putting data in the default public schema.

CREATE SCHEMA pihole;
COMMENT ON SCHEMA pihole IS 'Data loaded from Pi-hole.';

Foreign Table

The data I am most interested in is the table named queries. Lucky for me, the Pi-hole docs do a good job describing the table structure. The other way to see the table structure is to use sqlite3 and .schema query to get the code.

The foreign table is created to match the structure in SQLite. Make sure to use the same column names and data types to avoid errors.

CREATE FOREIGN TABLE pihole.query_fdw
(
    id BIGINT NOT NULL,
    timestamp BIGINT NOT NULL,
    type INT NOT NULL,
    status INT NOT NULL,
    domain TEXT NOT NULL,
    client TEXT NOT NULL,
    forward TEXT NULL,
    additional_info TEXT NULL
)
SERVER pihole_backup_server
OPTIONS (TABLE 'queries')
;
COMMENT ON FOREIGN TABLE pihole.query_fdw IS 'Linked to Pi-hole query table of local backup';

With the pihole.query_fdw foreign table created, attempt a simple query. This is helpful to ensure the foreign table was created successfully and to start getting an idea of what the data looks like.

SELECT * FROM pihole.query_fdw LIMIT 1;

┌──────────┬────────────┬──────┬────────┬────────────┬──────────────┬─────────┬─────────────────┐
│    id    │ timestamp  │ type │ status │   domain   │    client    │ forward │ additional_info │
╞══════════╪════════════╪══════╪════════╪════════════╪══════════════╪═════════╪═════════════════╡
│ 19493268 │ 1580569203 │    1 │      3 │ tag100.lan │ 172.16.0.240 │ ¤       │ ¤               │
└──────────┴────────────┴──────┴────────┴────────────┴──────────────┴─────────┴─────────────────┘

Exploring the data

The Pi-hole documentation told us the timestamp column is a Unix timestamp. We can convert this epoch to a Postgres timestamptz value with the to_timestamp() function. The Pi generating this data has its timezone set to America/Denver. Informing Postgres of the timezone is important to ensure data lines up properly with expectations.

SELECT TO_TIMESTAMP(timestamp) AT TIME ZONE 'America/Denver' AS ts
    FROM pihole.query_fdw
    WHERE id = 19493268;

┌────────────────────────┐
│           ts           │
╞════════════════════════╡
│ 2020-02-01 15:00:03+00 │
└────────────────────────┘

Another query to look at the row count and range of dates included. At this point it's important to validate that the max_ts lines up with when the backup was taken. If there was a problem setting the timezone properly it should be apparent here.

SELECT COUNT(*),
        MIN(to_timestamp(timestamp))
            AT TIME ZONE 'America/Denver' AS min_ts,
        MAX(to_timestamp(timestamp))
            AT TIME ZONE 'America/Denver' AS max_ts
    FROM pihole.query_fdw
;

┌──────────┬─────────────────────┬─────────────────────┐
│  count   │       min_ts        │       max_ts        │
╞══════════╪═════════════════════╪═════════════════════╡
│ 19386278 │ 2020-02-01 08:00:03 │ 2021-01-31 08:57:06 │
└──────────┴─────────────────────┴─────────────────────┘

Load into Postgres

With more than 19 million rows in the foreign table, performance will not be as good as I want. Relying on the FDW would also require keeping the SQLite database file around instead of just relying on my Postgres backups. Create a normal table for long-term storage and bring the data from the foreign table. This permanent table adds a PRIMARY KEY to the id column and renames some other columns.

The datatype of the ts column is TIMESTAMPTZ, the data will be converted from the Unix timestamp in the insert statement.

CREATE TABLE pihole.query
(
    id BIGINT NOT NULL PRIMARY KEY,
    ts TIMESTAMPTZ NOT NULL,
    query_type_id INT NOT NULL,
    status_id INT NOT NULL,
    domain TEXT NOT NULL,
    client TEXT NOT NULL,
    forward TEXT NULL,
    additional_info TEXT NULL
);

Load data over. This takes about 45 seconds on my laptop.

INSERT INTO pihole.query
    (id, ts, query_type_id, status_id, domain, client, forward, additional_info)   
SELECT id, TO_TIMESTAMP(timestamp) AS ts, type AS query_type_id,
        status AS status_id, domain, client, forward, additional_info
    FROM pihole.query_fdw
;

The status_id and query_type_id columns will be used frequently in both filters and joins. Create indexes on these columns to help improve performance.

CREATE INDEX ix_pihole_query_status_id
    ON pihole.query (status_id);
CREATE INDEX ix_pihole_query_query_type_id
    ON pihole.query (query_type_id);

The query_type_id and status_id columns have integer codes defined in the docs for query types and status types. I create tables and INSERT statements to store these lookup values.

pihole.query_type

CREATE TABLE pihole.query_type
(
    id BIGINT NOT NULL PRIMARY KEY,
    query_type TEXT NOT NULL,
    CONSTRAINT uq_pihole_query_type UNIQUE (query_type)
);
COMMENT ON TABLE pihole.query_type IS 'Describes the type of DNS query.'

INSERT INTO pihole.query_type (id, query_type)
    VALUES (1, 'A'), (2, 'AAAA'), (3, 'ANY'),
        (4, 'SRV'), (5, 'SOA'), (6, 'PTR'),
        (7, 'TXT'), (8, 'NAPTR'), (9, 'MX'),
        (10, 'DS'), (11, 'RRSIG'), (12, 'DNSKEY'),
        (13, 'Other'), (16, 'Unknown')
        ;

pihole.status

CREATE TABLE pihole.status
(
    id INT NOT NULL PRIMARY KEY,
    status TEXT NOT NULL,
    allowed BOOLEAN NOT NULL,
    description TEXT NOT NULL
);
COMMENT ON TABLE pihole.status IS 'Describes the status of the DNS query.'

INSERT INTO pihole.status (id, status, allowed, description)
    VALUES (1, 'Unknown', False, 'was not answered by forward destination'),
        (2, 'Blocked', False, 'Domain contained in gravity database'),
        (3, 'Allowed', True, 'Forwarded'),
        (4, 'Allowed', True, 'Known, replied to from cache'),
        (5, 'Blocked', False, 'Domain contained in exact blacklist'),
        (6, 'Blocked', False, 'By upstream server (known blocking page IP address)'),
        (7, 'Blocked', False, 'By upstream server (0.0.0.0 or ::)'),
        (8, 'Blocked', False, 'By upstream server (NXDOMAIN with RA bit unset)'),
        (9, 'Blocked', False, 'Domain contained in gravity database. Blocked during deep CNAME inspection'),
        (10, 'Blocked', False, 'Domain matched by a regex blacklist filter. Blocked during deep CNAME inspection'),
        (11, 'Blocked', False, 'Domain contained in exact blacklist. Blocked during deep CNAME inspection')
    ;

Now a query to put these lookup tables to work. Using the descriptors instead of their ID values makes it easier to see and understand the data.

SELECT q.id, q.ts, q.query_type_id, qt.query_type, q.status_id, s.status,  
        s.allowed, q.domain, q.client, q.forward
    FROM pihole.query q
    INNER JOIN pihole.query_type qt ON q.query_type_id = qt.id
    INNER JOIN pihole.status s ON q.status_id = s.id
    WHERE q.id = 19493268
;

Results using psql's extended display mode (\x) show this query was for the A record and was Allowed.

┌─[ RECORD 1 ]──┬────────────────────────┐
│ id            │ 19493268               │
│ ts            │ 2020-02-01 15:00:03+00 │
│ query_type_id │ 1                      │
│ query_type    │ A                      │
│ status_id     │ 3                      │
│ status        │ Allowed                │
│ allowed       │ t                      │
│ domain        │ tag100.lan             │
│ client        │ 172.16.0.240           │
│ forward       │ ¤                      │
└───────────────┴────────────────────────┘

Next steps

The PiHole-FTL database has additional tables I am interested in bringing in to Postgres.

A quick peek at the structure of these tables in sqlite shows some interesting details. For these tables I will repeat the steps of creating the foreign table and then creating a normal Postgres table with updated column names. For example, in the network table the firstSeen and lastSeen columns will become first_seen and last_seen.

sqlite> .schema network
CREATE TABLE IF NOT EXISTS "network" ( id INTEGER PRIMARY KEY NOT NULL, hwaddr TEXT UNIQUE NOT NULL, interface TEXT NOT NULL, firstSeen INTEGER NOT NULL, lastQuery INTEGER NOT NULL, numQueries INTEGER NOT NULL, macVendor TEXT, aliasclient_id INTEGER);

sqlite> .schema network_addresses 
CREATE TABLE IF NOT EXISTS "network_addresses" ( network_id INTEGER NOT NULL, ip TEXT UNIQUE NOT NULL, lastSeen INTEGER NOT NULL DEFAULT (cast(strftime('%s', 'now') as int)), name TEXT, nameUpdated INTEGER, FOREIGN KEY(network_id) REFERENCES network(id));

This post is part of the series PostgreSQL: From Idea to Database.

Summary

Once again Postgres' foreign data wrappers have proven to be a fantastic tool! Creating the sqlite_fdw extension was easy and connecting Postgres to the Pi-hole's FTL database went smoothly. My favorite use for FDWs is for an easy and efficient way to load data into Postgres from a variety of data sources. I typically limit the use of FDWs to temporary use as part of a larger data pipeline.

Need help with your PostgreSQL servers or databases? Contact us to start the conversation!

By Ryan Lambert
Published February 01, 2021
Last Updated February 01, 2021