My book Mastering PostGIS and OpenStreetMap is available!

# Setup Geocoder with PostGIS and Tiger/LINE

By Ryan Lambert -- Published October 08, 2023

Geocoding addresses is the process of taking a street address and converting it to its location on a map. This post shows how to create a PostGIS geocoder using the U.S. Census Bureau's TIGER/Line data set. This is part one of a series of posts exploring geocoding addresses. The next post illustrates how to geocode in bulk with a focus on evaluating the accuracy of the resulting geometry data.

Before diving in, let's look at an example of geocoding. The address for Union Station (see on OpenStreetMap) is 1701 Wynkoop Street, Denver, CO, 80202. This address was the input to geocode. The blue point shown in the following screenshot is the resulting point from the PostGIS `geocode()` function. The pop-up dialog shows the address, a rating of 0, and the calculated distance away from the OpenStreetMap polygon representing that address (13 meters), shown in red under the pop-up dialog.

The 0 rating score is the best score, indicating it is likely to be accurate. The query used to generate the above image is below.

``````SELECT b.osm_id, b.address, gc.rating,
ST_Distance(ST_Transform(b.geom, 2773),
ST_Transform(gc.geomout, 2773)
)::INT AS distance,
ST_Transform(gc.geomout, 3857) AS geom_geocoded,
b.geom
FROM osm.building_polygon b
LEFT JOIN LATERAL geocode(b.address, 1) gc ON True
WHERE b.osm_id = 25650822
;
``````

## Setting up PostGIS with TIGER/Line geocoding

My process to setup the geocoder is based on Michele Tobias' tutorial from 2017. The steps I show here mainly outline the changes made to run on Ubuntu 22.04 instead of Windows.

At the time of writing, the 2022 TIGER/Line data was the latest available. I check this URL for newer data by adjusting the year: https://www2.census.gov/geo/tiger/TIGER2022. The 2021 and 2022 data had dates in September of their respective years, so I expect the 2023 data will be available soon.

## Create Role and Database

The next thing to do is create a Postgres role to use for the geocoder process. The resulting database and table will be owned by this role. The load process involves creating extensions, schemas, and tables, so granting the `superuser` role is an easy solution. My approach to these needs has been to create the role initially with `superuser` and `login` access. These privileges are later revoked when no longer necessary.

``````sudo su - postgres
psql

CREATE ROLE geocode_user
;

\q
``````

Create the `geocode2022` database and set the `geocode_user` to the owner.

``````psql -U geocode_user -d postgres \
-c  "CREATE DATABASE geocode2022 WITH OWNER geocode_user ; "
``````

Connect to the `geocode2022` database as the `geocode_user` role.

``````psql -U geocode_user -d geocode2022
``````

Create the required extensions.

``````CREATE EXTENSION postgis;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;
``````

The next steps are to configure the details for the `postgis_tiger_geocoder` extension. This is done via the `tiger.loader_platform` and `tiger.loader_variables` tables. First, the `loader_platform` is modified for the `os = 'sh'` row. The `PGBIN` path should be updated if necessary, as well as all of the Postgres connection details.

``````UPDATE tiger.loader_platform
SET declare_sect = 'TMPDIR="\${staging_fold}/temp/"
UNZIPTOOL=unzip
WGETTOOL="/usr/bin/wget"
export PGBIN=/usr/lib/postgresql/15/bin
export PGPORT=5432
export PGHOST=localhost
export PGUSER=geocode_user
export PGDATABASE=geocode2022
PSQL=\${PGBIN}/psql
SHP2PGSQL=shp2pgsql
cd \${staging_fold}
'
WHERE os = 'sh';
``````

WARNING: The above example stores the `geocode_user`'s password in plain text in the `tiger.loader_platform` table. This risk is mitigated in the cleanup steps below by revoking both superuser and login from this role.

Update the `tiger.loader_variables` table to set the year, the website path (including year) and the path to the staging folder.

``````UPDATE tiger.loader_variables
SET tiger_year = '2022',
website_root = 'https://www2.census.gov/geo/tiger/TIGER2022',
staging_fold = '/gisdata'
;
``````

The above command sets the `staging_fold` to the `/gisdata` path. The following commands create the folder and set ownership to the `postgres` role for the next steps.

``````sudo mkdir /gisdata
sudo chown postgres:postgres /gisdata
``````

## Prepare and run geocoder scripts

There are two geocoder scripts that need to be created and ran. Creating the scripts is done as the `postgres` Linux user.

``````sudo su - postgres
cd /gisdata
``````

Create the nation script using the `Loader_Generate_Nation_Script()` function. The `--no-psqlrc -tA` avoids formatting issues in the generated shell script.

``````psql -c "SELECT Loader_Generate_Nation_Script('sh')" \
-d geocode2022 \
``````

Take a look at the generated script. After all, you wouldn't just auto-run scripts as a superuser suggested from someone on the internet, right?

``````nano /gisdata/nation_script_load.sh
``````

Set the execution flag on the generated script and run it. This takes less than 2 minutes on my laptop.

``````chmod +x nation_script_load.sh
``````

The script above populates the `tiger_data.county_all` and `tiger_data.state_all` tables. Verify data was loaded.

``````SELECT COUNT(*) FROM tiger_data.county_all;
SELECT COUNT(*) FROM tiger_data.state_all;

┌───────┐
│ count │
╞═══════╡
│  3235 │
└───────┘

┌───────┐
│ count │
╞═══════╡
│    56 │
└───────┘
``````

The same basic process is used to load the state data as was for the nation data. This step takes about 40 minutes on my laptop.

``````psql -c "SELECT Loader_Generate_Script(ARRAY['CO'], 'sh')" \
-d geocode2022 \

``````

The above script populates 11 tables with the necessary data for Colorado.

## Cleanup

Checking the `geocode_user` via `\du` shows the role is a superuser that can log in to Postgres.

``````([local] 🐘) ryanlambert@geocode2022=# \du geocode_user
List of roles
┌──────────────┬────────────┐
│  Role name   │ Attributes │
╞══════════════╪════════════╡
│ geocode_user │ Superuser  │
└──────────────┴────────────┘
``````

The following query revokes both Superuser and login abilities from the `geocode_user` role.

``````ALTER ROLE geocode_user WITH NOSUPERUSER NOLOGIN;
``````

Check `\du` again, the `Superuser` attribute is gone and the `Cannot login` attribute is displayed.

``````┌──────────────┬──────────────┐
│  Role name   │  Attributes  │
╞══════════════╪══════════════╡
│ geocode_user │ Cannot login │
└──────────────┴──────────────┘
``````

Run the `tiger.install_missing_indexes()` to ensure all columns used in geocoding are properly indexed. Last, it won't hurt to `ANALYZE` the database.

``````SELECT tiger.install_missing_indexes();
ANALYZE;
``````

## Geocode

The following query geocodes the address for Union Station found in OpenStreetMap data from the `osm.building_polygon` table.

``````SELECT b.osm_id, b.address, gc.rating,
ST_Distance(ST_Transform(b.geom, 2773),
ST_Transform(gc.geomout, 2773)
)::INT AS distance,
ST_Transform(gc.geomout, 3857) AS geom_geocoded,
b.geom
FROM osm.building_polygon b
LEFT JOIN LATERAL geocode(b.address, 1) gc ON True
WHERE b.osm_id = 25650822
;

┌──────────┬────────────────────────────────────────┬────────┬──────────┐
│  osm_id  │                address                 │ rating │ distance │
╞══════════╪════════════════════════════════════════╪════════╪══════════╡
│ 25650822 │ 1701 Wynkoop Street, Denver, CO, 80202 │      0 │       13 │
└──────────┴────────────────────────────────────────┴────────┴──────────┘
``````

## Why not use an API?

In a world of APIs, why go through all this work to setup your own geocoder? There's even an API from the U.S. Census Bureau if you specifically want that data source. There are quite a few reasons it is beneficial to stand up your own internal geocoding service. Some commonly encountered reasons:

• Privacy
• Legal
• Cost
• Performance
The instructions above created a `geocode2022` database loaded with data to serve as a PostGIS TIGER/Line geocoder for Colorado. Continue reading with Geocode Addresses with PostGIS and Tiger/LINE to explore bulk geocoding and the accuracy of geocoding results.