Setup Geocoder with PostGIS and Tiger/LINE
Geocoding addresses is the process of taking a street address and converting it to its location on a map. This post shows how to create a PostGIS geocoder using the U.S. Census Bureau's TIGER/Line data set. This is part one of a series of posts exploring geocoding addresses. The next post illustrates how to geocode in bulk with a focus on evaluating the accuracy of the resulting geometry data.
Before diving in, let's look at an example of geocoding.
The address for Union Station
(see on OpenStreetMap)
is 1701 Wynkoop Street, Denver, CO, 80202.
This address was the input to geocode.
The blue point shown in the following screenshot
is the resulting point from the
PostGIS geocode()
function. The pop-up dialog shows the address, a rating of 0,
and the calculated distance away from the OpenStreetMap polygon representing that address (13 meters), shown in red under the pop-up
dialog.
The 0 rating score is the best score, indicating it is likely to be accurate. The query used to generate the above image is below.
SELECT b.osm_id, b.address, gc.rating,
ST_Distance(ST_Transform(b.geom, 2773),
ST_Transform(gc.geomout, 2773)
)::INT AS distance,
ST_Transform(gc.geomout, 3857) AS geom_geocoded,
b.geom
FROM osm.building_polygon b
LEFT JOIN LATERAL geocode(b.address, 1) gc ON True
WHERE b.osm_id = 25650822
;
Setting up PostGIS with TIGER/Line geocoding
My process to setup the geocoder is based on Michele Tobias' tutorial from 2017. The steps I show here mainly outline the changes made to run on Ubuntu 22.04 instead of Windows.
At the time of writing, the 2022 TIGER/Line data was the latest available. I check this URL for newer data by adjusting the year: https://www2.census.gov/geo/tiger/TIGER2022. The 2021 and 2022 data had dates in September of their respective years, so I expect the 2023 data will be available soon.
Create Role and Database
The next thing to do is create a Postgres role to use for the geocoder
process. The resulting database and table will be owned by this role. The load process involves creating extensions, schemas, and tables,
so granting the superuser
role is an easy solution.
My approach to these needs has been to create the role initially
with superuser
and login
access. These privileges are
later revoked when no longer necessary.
sudo su - postgres
psql
CREATE ROLE geocode_user
WITH LOGIN SUPERUSER PASSWORD 'GeocodeUserPassword'
;
\q
Create the geocode2022
database and set the geocode_user
to the owner.
psql -U geocode_user -d postgres \
-c "CREATE DATABASE geocode2022 WITH OWNER geocode_user ; "
Connect to the geocode2022
database as the geocode_user
role.
psql -U geocode_user -d geocode2022
Create the required extensions.
CREATE EXTENSION postgis;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;
CREATE EXTENSION address_standardizer;
Update loader platform
The next steps are to configure the details for the
postgis_tiger_geocoder
extension. This is done via the
tiger.loader_platform
and tiger.loader_variables
tables.
First, the loader_platform
is modified for the os = 'sh'
row.
The PGBIN
path should be updated if necessary, as well as all
of the Postgres connection details.
UPDATE tiger.loader_platform
SET declare_sect = 'TMPDIR="${staging_fold}/temp/"
UNZIPTOOL=unzip
WGETTOOL="/usr/bin/wget"
export PGBIN=/usr/lib/postgresql/15/bin
export PGPORT=5432
export PGHOST=localhost
export PGUSER=geocode_user
export PGPASSWORD=GeocodeUserPassword
export PGDATABASE=geocode2022
PSQL=${PGBIN}/psql
SHP2PGSQL=shp2pgsql
cd ${staging_fold}
'
WHERE os = 'sh';
WARNING: The above example stores the
geocode_user
's password in plain text in thetiger.loader_platform
table. This risk is mitigated in the cleanup steps below by revoking both superuser and login from this role.
Update the tiger.loader_variables
table to set the year,
the website path (including year) and the path to the staging folder.
UPDATE tiger.loader_variables
SET tiger_year = '2022',
website_root = 'https://www2.census.gov/geo/tiger/TIGER2022',
staging_fold = '/gisdata'
;
The above command sets the staging_fold
to the /gisdata
path.
The following commands create the folder and set ownership
to the postgres
role for the next steps.
sudo mkdir /gisdata
sudo chown postgres:postgres /gisdata
Prepare and run geocoder scripts
There are two geocoder scripts that need to be created and ran.
Creating the scripts is done as the postgres
Linux user.
sudo su - postgres
cd /gisdata
Create the nation script using the Loader_Generate_Nation_Script()
function. The --no-psqlrc -tA
avoids formatting issues in the generated
shell script.
psql -c "SELECT Loader_Generate_Nation_Script('sh')" \
-d geocode2022 \
--no-psqlrc -tA > /gisdata/nation_script_load.sh
Take a look at the generated script. After all, you wouldn't just auto-run scripts as a superuser suggested from someone on the internet, right?
nano /gisdata/nation_script_load.sh
Set the execution flag on the generated script and run it. This takes less than 2 minutes on my laptop.
chmod +x nation_script_load.sh
./nation_script_load.sh
The script above populates the tiger_data.county_all
and tiger_data.state_all
tables. Verify data was loaded.
SELECT COUNT(*) FROM tiger_data.county_all;
SELECT COUNT(*) FROM tiger_data.state_all;
┌───────┐
│ count │
╞═══════╡
│ 3235 │
└───────┘
┌───────┐
│ count │
╞═══════╡
│ 56 │
└───────┘
The same basic process is used to load the state data as was for the nation data. This step takes about 40 minutes on my laptop.
psql -c "SELECT Loader_Generate_Script(ARRAY['CO'], 'sh')" \
-d geocode2022 \
--no-psqlrc -tA > /gisdata/co_script_load.sh
chmod +x co_script_load.sh
./co_script_load.sh
The above script populates 11 tables with the necessary data for Colorado.
Cleanup
Checking the geocode_user
via \du
shows the role is a superuser
that can log in to Postgres.
([local] 🐘) ryanlambert@geocode2022=# \du geocode_user
List of roles
┌──────────────┬────────────┐
│ Role name │ Attributes │
╞══════════════╪════════════╡
│ geocode_user │ Superuser │
└──────────────┴────────────┘
The following query revokes both Superuser and login abilities from
the geocode_user
role.
ALTER ROLE geocode_user WITH NOSUPERUSER NOLOGIN;
Check \du
again, the Superuser
attribute is gone and the
Cannot login
attribute is displayed.
┌──────────────┬──────────────┐
│ Role name │ Attributes │
╞══════════════╪══════════════╡
│ geocode_user │ Cannot login │
└──────────────┴──────────────┘
Run the tiger.install_missing_indexes()
to ensure all columns used in geocoding
are properly indexed. Last, it won't hurt to ANALYZE
the database.
SELECT tiger.install_missing_indexes();
ANALYZE;
Geocode
The following query geocodes the address for Union Station
found in OpenStreetMap data from the osm.building_polygon
table.
SELECT b.osm_id, b.address, gc.rating,
ST_Distance(ST_Transform(b.geom, 2773),
ST_Transform(gc.geomout, 2773)
)::INT AS distance,
ST_Transform(gc.geomout, 3857) AS geom_geocoded,
b.geom
FROM osm.building_polygon b
LEFT JOIN LATERAL geocode(b.address, 1) gc ON True
WHERE b.osm_id = 25650822
;
┌──────────┬────────────────────────────────────────┬────────┬──────────┐
│ osm_id │ address │ rating │ distance │
╞══════════╪════════════════════════════════════════╪════════╪══════════╡
│ 25650822 │ 1701 Wynkoop Street, Denver, CO, 80202 │ 0 │ 13 │
└──────────┴────────────────────────────────────────┴────────┴──────────┘
Why not use an API?
In a world of APIs, why go through all this work to setup your own geocoder? There's even an API from the U.S. Census Bureau if you specifically want that data source. There are quite a few reasons it is beneficial to stand up your own internal geocoding service. Some commonly encountered reasons:
- Privacy
- Legal
- Cost
- Performance
- Terms of use
Using a geocoder locally where your data is stored can avoid a lot of caveats. Location data is often considered sensitive, this gives you more assurances of how the data is (and isn't!) being used.
Summary
The instructions above created a geocode2022
database
loaded with data to serve as a PostGIS TIGER/Line geocoder
for Colorado.
Continue reading with
Geocode Addresses with PostGIS and Tiger/LINE to explore bulk geocoding and the accuracy
of geocoding results.
Need help with your PostgreSQL servers or databases? Contact us to start the conversation!