Address standardization using the natural language process for improving geocoding results
Özet
Geocoding is a tool that can be used in many areas such as the development of disaster prevention systems, crime mapping and the monitoring of communicable diseases, and which has gradually gained importance. However, the use of geocoding is not yet possible in some areas where it could serve as an effective tool, for various reasons such as inconsistencies in address formats, including inaccurate numbering systems, misspellings, the use of abbreviations and a lack of data that refers to the geocoding process. This study seeks to address these problems by way of a standardization process. To that end, it employs a method that decomposes addresses used as input data in geocoding, identifies spelling mistakes and abbreviations, and reorganizes the addresses through the Natural Language Process (NLP). As test data, the addresses of primary schools in the district of Eskisehir are taken. First the geocoding process is performed on the data set, using both Google geocoding API and ArcGIS geocoding API. Then, the addresses are reformatted into three address formats by applying standardization processes. Geocoding is performed on the re-formatted addresses and the results compared to the non-standardized results. The standardization used is shown to make a significant improvement in the accuracy of the geocoding results. The method used in this study is significant not only in increasing the accuracy of the geocoding process, but also in sustaining its wider use.