![Page 1: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/1.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
On building a high performance
gazetteer database
Amittai AxelrodMetaCarta Inc
![Page 2: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/2.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Thanks to
Keith Baker
Kenneth Baker
Michael Bukatin
András Kornai
![Page 3: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/3.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Plan of the talk
• Database background
• Relating geographic names and features
• Handling ambiguities and inconsistencies in geographic names
• Classification and storage system for geographic features
![Page 4: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/4.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Databases
• No DB (faking it with flat files) -- clumsy
• Record-oriented -- still runs the world
• Relational -- making headway
• Object-oriented -- still very academic
• For MetaCarta GazDB, relational approach made most sense:• Overlapping records (McKinley/Denali)• Need for frequent updates of subparts of
records
![Page 5: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/5.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Gazetteer production process
![Page 6: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/6.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Conversion scripts
• Enforce uniform structure on the data
• Normalize across sources (e.g. lat/lon to decimal degrees, spelling, …)
• Configuration required once per source
• Load data in GazDB
• Combination perl/SQL
![Page 7: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/7.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Relating features and names
![Page 8: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/8.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Other tables used in GazDB• Population• Elevation• Language• Feature type• Source/versioning info• Temporal extent• Hierarchical information• Confidence• Comments• Change logs (full auditing)
![Page 9: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/9.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Geographic names
• Internationalization• Full Unicode (UTF8) support• Maintain detail language information (SIL)
• Name resolution • Canonical form (16 bits)• Display form (8 bit)• Search form (6 bit)
• Authoritativeness
• Explicitness
![Page 10: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/10.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Updating a name in the GazDB
![Page 11: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/11.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Geographic features
• Spatial representations • Point, line, area, …
• Functional classes• Building, field, campus, city, …
• Administrative types• Nation, province, county, international org, …
![Page 12: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/12.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Export scripts
• Read GazDB
• Select which fields to include in custom output
• Creates .gbdm (MetaCarta format) binaries
• Combination perl/SQL
• Not yet general across binary output formats
![Page 13: On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc](https://reader035.vdocuments.us/reader035/viewer/2022062519/56815420550346895dc21d47/html5/thumbnails/13.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Conclusions• Accept multiple sources (only configure
once per source)• Fast loading of large datasets (1m entries
per hour on linux desktop)• Simple update procedure• Outputting large binary custom gazetteers
for different purposes at extreme speeds (1m entries per minute)