Feature Matching Across Polygon Sets with geomatchR: An Application to Historical Census Geography
Identifying whether polygons in different spatial databases correspond to the same fundamental features is an enduring problem in spatial analysis. Past researchers have developed a variety of workflows. Some have focused on feature overlap, for example, assessing the degree to which a given vector polygon (e.g., an administrative boundary) or raster feature corresponds to features in a separate raster dataset (e.g., land cover categories) at multiple levels of geographic or categorical aggregation (Hargrove, Hoffman, and Hessburg 2006; Sadahiro and Oguchi 2015; Power, Simms, and White 2001; Hagen-Zanker 2006). Others have focused on matching boundary line segments that define polygons (Gombosˇi, Zˇalik, and Krivograd 2003; Masuyama 2006). Still others have borrowed concepts and techniques from computer science and graph theory in mathematics (Nowosad and Stepinski 2018; Dias and Silver 2021). Many of these approaches are complex and computationally expensive to implement. Some generate global measures of association between maps (Nowosad and Stepinski 2018) rather than indicating if specific features correspond to one another. We have developed a mathematically straightforward algorithm in the R language that compares two overlapping contiguous polygon sets and identifies which polygons are best matches for one another: geomatchR. We demonstrate the technique on a newly assembled collection of historical Canadian Census boundaries spanning the 1851–2021 period. To evaluate its effectiveness, we compare its output to that generated by Hargrove, Hoffman, and Hessburg’s (2006) conceptually similar Mapcurves algorithm and to a handcoded lookup table matching Census geographic units across years. We conclude with a discussion of limitations and potential refinements.