An enhanced UN/LOCODE dataset with significant improvements:
The main reason this project exists: coordinates in the original UN/LOCODE list have major problems:
1. Only 80% of locations have coordinates
This doesn't just include tiny villages, but world's most important cities like London (GBLON), Madrid (ESMAD), Luxembourg (LULUX) and Milano (ITMIL).
2. Many coordinates are just wrong
Quite a few coordinates have typos (ATWIS), but many are just flat out wrong (EGSCN)
This project aims to solve most of these cases by combining the data with data from OpenStreetMap's Nominatim API and Wikidata.
3. Multiple coordinate formats
Most UN/LOCODES coordinates look like USNYC: 4042N 07400W. However, entries in Bhutan like BTPDL have decimal coordinates: 26.8128N 89.1903E. This project solves this with 2 columns: the Coordinates
column now has only the UN/LOCODE style degrees, while the CoordinatesDecimal
column has a decimal representation.
This is all solved with code-list-improved.csv. It has both corrected coordinates, as well as just way more of them (98.4%).
Another issue is hierarchy. For example: CNSHZ (Shanghai Hongqiao International Apt), is in Shanghai (CNSGH), but how would you know these are essentially the same place? Ideally, you'd want to know the Airport is in Shanghai.
For this, parents.csv is created, which looks like this:
Unlocode,Parent
CNSHZ,CNSGH
CNPDG,CNSGH
With this, you can easily find out these are all related.
It's impossible to find out that both "Vienna" and "Wien" are in fact the same city with UN/LOCODE ATVIE. That is, if you use the offical dataset.
Not so much with aliases-improved.csv, which looks like this:
Unlocode,Alias
ATVIE,Wien
ATVIE,Vienna
This is much more usable than the aliases in the original. Not only because of the improved user-friendlyness, but mostly because of its sheer size. The official dataset has less than 100 aliases, this one has over 575.000.
The United Nations Code for Trade and Transport Locations is a code list mantained by UNECE (a United Nations agency) to facilitate trade. The list is comes from the UNECE page, released twice a year. However, this dataset is based on datasets/un-locode, which is already much better than the original (e.g. no more encoding problems).
All unlocode data is licensed under the ODC Public Domain Dedication and Licence (PDDL).
ODbL 1.0. http://osm.org/copyright
CC-0 (No rights reserved)
CC-0 (No rights reserved)