OSGeo Events, FOSS4G 2008

Font Size:  Small  Medium  Large

A data model for efficient address data representation - Lessons learnt from the Intiendo address matching tool

Abdullah Al Rahed, Serena Coetzee, Magnus Rademeyer

Building: Cape Town International Convention Centre
Room: Langjan Room (Room 2.6a)
Date: 2008-10-01 08:30 AM – 10:00 AM
Last modified: 2008-09-01

Abstract


Addresses are often structured into a spatial hierarchy that describes a location with increasing accuracy. In the address '14 Richmond Road, Mowbray, Cape Town, South Africa' the spatial accuracy increases from country (South Africa) to city (Cape Town) to suburb (Mowbray) to street (Richmond Road) to street number (14). The international standard, ISO 19112 - Spatial referencing by geographic identifiers, provides a general model for spatial referencing using geographic identifiers and defines the components of a spatial reference system. In the ISO 19112 model, a spatial reference system using geographic identifiers comprises a related set of one or more location types, together with their corresponding geographic identifiers, and the location types may be related to each other through aggregation or disaggregation, possibly forming a hierarchy. This general model is applicable to an address structured into a spatial hierarchy, as in our address above: country, city, suburb, street and street number are the location types, each with their own set of geographic identifiers: city names of South Africa are the geographic identifiers of the 'city' location type, and 'Cape Town' is a geographic identifier of that location type describing a specific location. The British address standard, BS 7666-0:2006 - Spatial datasets for geographical referencing, is based on this notion of a spatial referencing system comprising a hierarchy of location types.
Geocoding refers to the process of assigning geographic identifiers and/or geographic coordinates to the description of a feature location, i.e. the words, codes or terms that describe a feature's location. Address matching is the specific case of geocoding where the description of a feature location comprises an address. A geocoding service receives the description of the feature location, e.g. an address, as input and searches for a matching address in a reference dataset. Address matching is complicated by an incomplete or inaccurate incoming address or one that contains a misleading geographic identifier in its location type hierarchy. If the address matching relies on the location type hierarchy and alphanumeric matching only, an incoming '101 Rubida Street, Willows' will be incorrectly matched to '110 Rubida Street, Willows', and not the more accurate '101 Rubida Street, Murrayfield', which lies on the opposite side of the road.
The Intiendo address matching tool is based on a data structure that is similar to the hierarchy of location types described in ISO 19112, but we have made some novel extensions to enable a spatial adjacency search. Intiendo does not rely purely on the alphanumeric match in the location type hierarchy but incorporates spatial proximity into the address matching process so that the above address would be matched correctly to '101 Rubida Street, Murrayfield'. We also describe how the Intiendo address matching process can be configured and fine tuned, for example, by assigning weights to the location types in the hierarchy, and by specifying parameters for the spatial adjacency match.
In this paper we present the hierarchical data structures of the Intiendo address matching tool and show how they are an extended implementation of the ISO 19112 general model. We show the similarities between the Intiendo and ISO 19112 models, and present the extensions that were implemented in Intiendo. By way of examples, we show that our extended model allows more efficient and accurate address matching incorporating spatial adjacency and hierarchical fine tuning.

Full Text: PAPER  |  PRESENTATION