Stata Geocoding Tutorial

Before we dive in to the tutorial ...

This is a tutorial for using the OpenCage Geocoding API in Stata. Before you can query the API you will need to sign up for an OpenCage API key.

Once you've done that we recommend you spend five minutes on:

Ok, ready?

opencagegeo is a Stata module written by Lars Zeigermann to access the OpenCage Geocoding API. You can find the newest version here.

Install (or update) opencagegeo

* Install the Stata module and two required user-written stata libraries from SSC:
. ssc install opencagegeo
. ssc install libjson
. ssc install insheetjson

* If you already have opencagegeo installed make sure you have the newest version
. adoupdate opencagegeo, update

Batch geocode addresses (forward geocoding)

* If you have a dataset of addresses stored in a single string variable 'address'
. opencagegeo, key(YOUR-API-KEY) fulladdress(address)

* If your addresses are stored in separate variables, e.g. house number in 'num', street name in 'str', city in 'city', and country in 'ctry':
. opencagegeo, key(YOUR-API-KEY) number(num) street(str) city(city) country(ctry)

Batch geocode coordinates (reverse geocoding)

* To geocode coordinates stored in a single variable 'coords' in the following format: latitude,longitude
. opencagegeo, key(YOUR-API-KEY) coordinates(coords)

* If your coordinates are stored in two separate variables 'lat' and 'lng'
. opencagegeo, key(YOUR-API-KEY) latitude(lat) longitude(lng)

Geocoding a single address or pair of coordinates

To geocode a single address or coordinates, you can use opencagegeoi the immediate version of opencagegeo
* First you need to save your API key to a global macro 'mykey'
. global mykey YOUR-API-KEY
. opencagegeoi YOUR-ADDRESS-HERE
. opencagegeoi YOUR-LATITUDE,YOUR-LONGITUDE

Learn more

. help opencagegeo

Troubleshooting common problems

  • Unfortunately Stata does not do well with parsing API responses that contain place names with apostrophes in the place name. For example the Earl's Court area of London. The problem is the apostrophes in our JSON response cause Stata's JSON parsing engine to die, thus causing the program to die.

    Adding to the confusion the opencagegeo module unhelpfully then falls back to a default error message which says

    Invalid key, rate limit exceeded or no internet connection

    which is simply incorrect. You can test your API key by clicking on the "Sample request using this key" link in your account dashboard.

    So, how can you solve this and go forward with your geocoding? The only solution we have found is to determine which of your queries is causing the response which leads to the problem and exclude that query from your data set. A tediuos process, we know and sympathise.

    the other option is to use another programming language like R, Python, Matlab, etc. Happily we have tutorials for all of those, but we also appreciate that it is not easy to jump to another language.

    Sorry. We welcome all suggestions as to how to prevent this bug. If anyone from StataCorp is reading this, please get in touch and we can supply examples. Stata is the only language where this seems to happen.

  • In older versions of this software there is an optional parameter paidkey which needs to be set if you are an OpenCage customer, so that the software can deal with the slight difference in format between free trial and paid responses. This is not needed in the newest version.
Start your free trial

2,500 API requests per day.

No credit card required.