Points to consider when building a reverse geocoder
Are you ready to roll your own reverse geocoder?
If your use cases are specific enough, that may be the best way to get the exact results you need.
On the other hand, you should find open data and tools along the way to make sure you don't have to do all the work yourself.
We'll cover five important steps to build your own reverse geocoder:
- Determine what's needed for your use case
- Find and stage your data sources
- Determine what data you will return
- Format the data how users expect
- Maintain and improve your geocoder
When you build a geocoder, there's a lot that goes into it.
You want to consider the effort required to find data, keep it updated, write the code (or tune an existing geocoder), and maintain that code.
That all starts with a crystal clear idea of how you'll use it.
Determine What's Needed for Your Use Case
reverse geocoding use cases
determine how complex the requirements are to build your own. For example, if you simply need to know the continent of a location, you may not even need a database to produce a result. On the other hand, returning fully formatted addresses and various annotations about a place may find you running multiple geocoders.
We'll look at the types of reverse geocoders, review the effort required to build, and how they can be assembled into a single geocoding solution.
Types of Reverse Geocoders
The general definition for a reverse geocoder is incredibly broad: based on latitude and longitude coordinates, return a human-readable description of the location. The location might be a very specific place, a nearby landmark, or a larger area.
The geography represented by a reverse geocoder is typically one of the following:
- A nearby point to coordinates
- A nearby path to the coordinates
- A shape surrounding the coordinates
These three geographic features generically describe many different types of place data. Here are some common results from geocoders:
- Full address of location (point)
- Country, region, county/province, or state boundaries (shape)
- Road or highway (path)
- City or postal code boundaries (shape)
- Landmark or prominent point of interest (point)
- Ocean and sea boundaries (shape)
- Rivers and streams (path)
- Time zones (shape)
In addition, there are many other pieces of information you might be able to infer based on a location. For example, OpenCage returns currency, calling code, sunrise/sunset, and more. The full list is available within our
There are many potential areas of geographic intelligence that a reverse geocoder can return. Many have already been implemented by others, often with data easily sourced. In other circumstances, you might need to compile it on your own.
Should You Even Build a Geocoder?
After seeing the many types of reverse geocoders, you may realize the complexities that hide beneath the surface of a simple-looking service. The decision of whether to build or buy a geocoder is specific to your needs and resources.
Here are some helpful questions to answer:
How many types of geocoders will you need?
Take stock of the effort required to build and maintain your reverse geocoding toolset.
Is the data easily accessible for the geocoders you require?
You'll need a process to acquire and update your datasets.
Do you have in-house geographic information system (GIS) expertise?
Most developers can get up to speed, but the learning curve could be steep.
Will you need to incorporate proprietary datasets?
If you use data that commercial geocoders cannot access, you may only be able to roll your own.
Are you able to commit to maintaining your reverse geocoder?
For fast-changing datasets, the initial build time may be a fraction of the long term effort.
Assemble Your Reverse Geocoder Pieces
Once you know the geocoders needed for your use case, you can put them together, likely as a single service. It might be possible to call each geocoder separately, but that puts additional pressure on your applications to determine what result each service produces.
For example, if you require a full address and time zone, those are likely two separate geocoders. Your application could request the address from the first geocoder, then call the second geocoder for the time zone. Alternatively, write wrapper code - or your own API service - to call each geocoder and return both results at once.
At OpenCage we run 10 geocoders that are exposed via a single, flexible API.
You can see the full list
Behind the scenes, we run microservices for each geocoder, which are called by a service that compiles the results into a single API response.
Find and Stage Your Data Sources
A reverse geocoder relies on one or more data sources to determine nearby and geographic features. These are many places to discover this data covered in the
reverse geocoding resources
Typically, data sources fall into a few categories:
- Public domain
Each of these takes effort to compile and prepare. A licensed data source may reduce this effort, though pricing can vary. The most cost effective approach is to review whether open data sources can meet your needs.
Open Geo Data for Geocoders
The best publicly available source for worldwide geographic data is
(OSM). A global community of mappers creates and maintains the dataset.
Like Wikipedia for the physical world, anyone can make edits - and many do.
For over 15 years, volunteers have uploaded tracks from GPS-enabled devices, labeled landmarks, and corrected OSM data. Due to the large number of contributions, OSM is a robust source to use for many projects, including reverse geocoding.
The biggest advantage of open data like OSM is you can use it without restrictions.
Compared to proprietary data
there are no commercial limitations or requirements to combine it with other services.
OpenCage is a proud corporate member of the OSM Foundation, and
we do a lot to support the geo community
We contribute to the tools, host
geo innovation events
and have long been committed geo geeks.
Install OpenStreetMap on Your Servers
It takes a lot of data to describe the whole world. Before you download OSM data, determine the regions that your reverse geocoder needs to support. A subset of OSM data is much easier to download and install, if that's an option.
If you need worldwide coverage, you'll need to grab the latest snapshot of OSM data. This initial download is nearly a terabyte of data. It includes every feature, line, and label needed to build many projects from OSM - geocoders, maps, and more. It can take up to a week to install!
You can start with a subset of data defined by a bounding box. This can help you determine whether OSM is a fit for your use case without committing all the time and bandwidth to the worldwide dataset. There are
of full and partial data on the OSM site.
Once you have an initial dataset, you'll be able to use much smaller downloads to incorporate the community's frequent updates.
Nominatim as a Base Geocoder
In addition to the data itself, there are a number of tools in the OSM ecosystem. Popular among those is
a geocoder first written to help community mappers.
You can use it to find places by name, address, or location.
You can try out
Nominatim for reverse geocoding
with the OSM-hosted version of the service. It supports millions of queries
every day, but is not built for production projects. It serves the search
bar for the primary OpenStreetMap site, which helps the community
contribute new data to the project.
provides details on installing your own server, which you can pair with OSM data.
Among the Nominatim requirements are:
- PostgreSQL 9.3 or later
- PostGIS 2.2 or later
- PHP 7.0 or later
There are also some prerequisites to compile and test Nominatim, as well as apply OSM updates.
Keep in mind that by default it will search everything included in OSM. You'll return results for a lot of data you might not want to include in a reverse geocoder, like park benches. With some feature filtering, Nominatim can be a great foundational geocoder.
Nominatim is the most important of the 10 geocoders run by OpenCage.
In fact, one of the OpenCage founders is a
top Nominatim contributor
However, it can take some tuning for general purpose geocoding, which comes down to formatting and selecting the appropriate data to return.
Determine What Data You Will Return
Earlier we described how your use case influences what type of reverse geocoder you need. Similarly, the data that you return from your geocoder is impacted by what your application needs. From potentially terabytes of data, you need to decide what small subset to return.
You may only need high level data, or perhaps address-level details. And what contextual data needs to come along? There's a lot to consider, even running in a single geocoder.
How Granular is Your Geocoder?
The simplest reverse geocoders will return a single text string that describes the location. Even within that basic use case are a number of decisions. The top consideration is granularity - what level of naming do you want to include?
There are many potential names for a place. For example, consider this inexhaustive list in roughly hierarchical order:
- Multi-country region
- Intra-country region
- Province or state
- Metropolitan area
- County or other sub-division
- Postal or ZIP code
- Census designation
- Road or highway
- Street address
As you can see, there's nothing "simple" about a single text string. At the very least, it's hiding a lot of complexity and decisions behind the scenes. In many cases, you'll want to include more than one of the above names. A complete address would include the street address, as well as additional city and regional information.
Finally, you might also include geographic data along with the written labels for a place. Here you also need to make a decision about granularity. If you return the country name, do you also provide the geographic center of the nation? Based on your needs, you might even want the boundaries of the country. You'll need to determine the scale for those boundaries, as more granularity will return larger data.
Each of the decisions you make about granularity impacts how you will build, maintain, and consume your reverse geocoder.
Beware of Annotation Creep
It can be tempting to keep adding things for your reverse geocoder to return. The location intelligence can become addicting! Each additional data point, which we call annotations, adds complexity in your geocoder and in the applications it powers. You'll need to decide what data your use case really needs and weigh that with the effort required to produce it.
OpenCage includes a number of annotations based on customer requests and our own research. Some of the most useful supplementary data includes:
- Speed limit
- Time zone
- Local currency
- Sunrise and sunset
- United Nations code
You can see a full list in our
We periodically add new data to expand what we provide. Of course, we run a geocoding service, so that's our job. You want to think carefully before adding additional layers of results for your reverse geocoder.
Each annotation becomes another project to build and maintain. We run 10 geocoders, but you probably shouldn't. The
reverse geocoder tutorial
uses a sample oceans dataset to show how you might add regional annotations to your project. While a basic example, it gives a taste of what's required to include even a simple annotation in your geocoder.
Format the Data the Way Your Users Expect
The entire point of reverse geocoding is to take a machine-readable location and make it something humans can easily understand. Knowing
data to return is not the same as determining
how to return it
Reverse geocoding exposes itself as the end user experience. A result that doesn't consider what users expect might provide mental hurdles to the expected meaning. Great reverse geocoding will feel natural, which means paying attention to how data is formatted.
Localization and Localisation
Users expect your application to speak their language, which means your reverse geocoder will also need to be understood internationally. If English is the only language you support, it's easier. But if you support multiple countries, there are differences you should consider.
Regions, countries, and bodies of water can have completely different names. Based upon your approach above, you can determine what to send to the user. However, this requires that your data is available in each language you intend to support. Some datasets may already have multiple languages available, while others will need to be translated.
There are three common approaches you might take to localization:
- The language set in the current web browser
- The native language of the current location
- The user's preferred language
OpenCage makes much of our results available in different languages. We support all three of the common approaches mentioned above by
using the optional
when making an API request.
Addresses of the World
Perhaps even harder than multiple languages are the various ways addresses are formatted. The ordering and inclusion of address components is different everywhere. If you attempt to apply your formatting bias to every result, you'll end up with confused users.
For example, a postal code may appear before a city, after a state, or in many other locations (or not at all!). Here's the address of the capital of the US state Georgia's reverse geocode result:
Martin Luther King Junior Drive Southeast
Atlanta, GA 30303-3506
United State of America
And now the parliament address for the country of Georgia:
Rustaveli Avenue 8
The two barely appear related. These issues occur even with simple city names. You wouldn't think twice in the United States about "New York, New York". Yet, in Germany, you won't find anyone referring to "Berlin, Berlin". Like New York City, Berlin is part of a state with the same name. However, since the boundaries of Berlin (the state) and Berlin (the city) are the same, it's simply called "Berlin."
We wrote about this issue in a
blog post from 2014
"Berlin, Berlin" is technically correct in some theoretical sense, but stylistically it is wrong. It's the type of little detail that makes a digital service feel clunky and non-local.
To solve the problem, we created a templating format to describe the local structure of place names.
We open sourced the address formatting templates
we wrote to avoid those results that "feel wrong".
Every country has a template that can help you return results that feel right.
OpenCage's reverse geocoder returns the individual components of each result.
In addition, we apply the local template and provide a well-formatted string
portion of each API result.
Geographic Edge Cases Abound
Language and formatting have plenty of variation, but these issues don't stop there. Building a reverse geocoder will help you uncover plenty more small inconsistencies. Each time you come upon one, you can either apply a custom fix for it (adding to the code required to run your geocoder) or ignore it (adding to the cognitive overload for your users).
For example, Washington DC is part of the United States. However, it is specifically
considered a state. If your results display "District of Columbia" as if
its state, they would be inaccurate. It must be treated differently than
any other city in the United States.
Here's the thing: almost every country has some historical anomaly like this. Everywhere you look, there are geographic edge cases to consider. And as we learned with formatting, when you don't catch these issues, users simply see the results as wrong.
The topic gets even hotter when you consider disputed territories. In some cases, the name you use ("Myanmar" vs "Burma") may say more than you intend to by providing a label. In other circumstances, two countries may lay claim to the same land along a border. You need to decide which country name to return - and whether to use different results based on other factors about the request.
A truly worldwide reverse geocoder will need to consider all of these touchy issues and many, many more, all while keeping data and code updated.
Maintain and Improve Your Geocoder
A lot of effort goes into building a reverse geocoder. At a minimum, you must source data and have a way to query that data. As we've seen, you also need to determine what data is returned and how it's formatted. Finally, every decision you make, data you include, and code you write must be maintained for as long as you need to run the geocoder.
Even proprietary data requires maintenance. You'll need to download and install updates. If you built the dataset yourself, you'll need some mechanism to report bugs and determine what requires a fix.
Likely, at least part of your geocoder is based on open data, such as OSM. It makes sense to use this Wikipedia-like source. However, like the online encyclopedia, OSM is ever-evolving. To stay accurate, you need to frequently refresh your reverse geocoder's data.
Apply OpenStreetMap Diffs
There are millions of OSM updates every single day, as people around the world improve the map. There are millions of OSM editor accounts and many thousands are active every month. Whether updates, corrections, or brand new data, these edits continue to grow, as shown by a chart from
Once you've downloaded the OSM data you need, you will need to regularly apply "changeset" or diffs of the database. These downloads are much smaller and contain a description of only the data that has changed since the previous update.
but if you go too long between updates, then refreshing your data
can take multiple days. To keep your OSM-driven reverse geocoder updated,
you'll want a plan for overseeing and producing these updates on a continual basis.
Be a Good Open Source and Open Data Citizen
Keep in mind what makes OSM and other open data projects work is the community of contributors. If you build your reverse geocoder off of OSM data or software, you want to do your part to give back.
There are many ways to get involved in open data and OpenStreetMap.
Here are just a few ideas:
Submit edits to OSM.
See our tutorial.
Join the OSM Foundation
- Participate in community discussions and events
- Submit bugs and patches to opensource geo software projects
- Open source your tools and data
Time is Your Most Expensive Resource
For most companies, developer time is extremely expensive. At the very least, it's finite. You need to decide where to focus that time. Should it be spent reinventing the reverse geocoding wheel or improving the experience of your core product?
There are many open source and open data tools that can help you build a reverse geocoder. As you've seen in this guide, doing this well requires a lot of effort. If you're geo geeks like us, you might want to give it a shot yourself. But we've found most developers want to get the data, so they can build something original. That's why we built a service that makes reverse geocoding easy for developers.
To give OpenCage a shot, we recommend you save yourself even more time and use one of our libraries. We provide
SDKs in over 30 programming languages
to get you started quickly.