Tuesday, October 29, 2019

World-Historical Gazetteer: Beta Release v. 0.2

At long last we are ready to offer a v0.2 beta release of the World Historical Gazetteer (WHG) at http://dev.whgazetteer.org. We hope that spatial historians and spatio-temporal infrastructure developers will be interested in taking a look at what we are building, experimenting with their data or provided samples. It is a “sandbox,” so nothing will be saved for the time being (that will change soon). There are 5-6 months remaining in the term of our initial NEH grant, time enough to complete most of what we planned for this phase, and to incorporate more suggestions from users and potential contributors as we move toward future planning and development.
The site includes a brief guide titled “WHG Beta Release: A Tour,” which outlines what is there, what you can do and how, remaining challenges, and what is in the works. What follows is a higher level introduction.
Places and Traces
The World Historical Gazetteer is a Linked Open Data platform for publishing, linking, discovering, and visualizing contributed records of attested historical places and traces. Our initial focus has been on places, but we are working experimentally to demonstrate their integration within the platform with what we now call traces–defined as web resources about historical entities for which location in time and space is of scholarly and general interest. We are considering three classes of traces for the time being: agents (people and groups), works (e.g. artifacts, texts, datasets), and events (e.g. journeys, conflict). Our objective has been to create the first large-scale spatial infrastructure for world history: oriented toward documenting the human past at the global scale, and particularly the geography of global and transregional connections.
Our accessioning process is intended to eventually be largely self-directed; getting it to that stage means working directly and hands-on with our early contributors.
LOD Publication
Registered users of WHG can publish their place records as Linked Open Data simply by uploading them in Linked Places format (or the LP-TSV version intended for relatively simpler records). We see LOD publication as a key feature for researchers who are not in a position to stand up their own web interfaces with per-place pages. Once uploaded, each record will have a permanent URI and be accessible in our graphical interface and API; on their way to being LOD in good standing. The dataset can be browsed immediately by its owner in a searchable table and map, but turning the uploaded dataset into a contribution for accessioning requires some further steps. The data needs to have as many asserted links to name authorities as possible, and augmentations of geometry where that is missing and findable. We provide reconciliation services for that purpose.
Simply put, reconciliation is the process of identifying matches between records of named entities. In this case the records are for places, and the matches are between a researcher’s records and those in existing place name authorities. So far, we provide reconciliation services for the Getty Thesaurus of Geographic Names (TGN) and Wikidata; DBpedia and GeoNames are planned.
The reconciliation process has two steps: 1) sending records to the authority, and 2) reviewing the prospective matches returned and accepting or declining them as appropriate. The results of this somewhat laborious process are 1) links, and 2) more geometry. Once augmented in this way, a dataset is ready for accessioning.
The last step is another reconciliation effort — this time to the WHG index. Each record is compared to the growing WHG index to determine if we have a contributed attestation for the place yet or not. If we do, the incoming record becomes a “child” or “leaf” in the set of attestations for the place. If the place is not yet accounted for, the new record becomes a “parent” — the seed for a new set of attestations. At this stage, an automatic linking can be made if two records share an authority match, but the rest will have to be reviewed as described above.
Graphical Interface
The opening screen of WHG offers users search of places and traces. We try to offer enough context on the opening screen to identify the likeliest match. Once you identify a place of interest, clicking its name take you to a “place portal” screen–where everything we have about the place, or linked to it in some way, will appear: attestations from contributors, associated traces, nearby places, physical geographic context (rivers, watersheds, ecoregions). The place portal is very much a work-in-progress at this stage. Several other features are also on our near-term to do list, including advanced search; more and better maps; user data collections; project team ‘workspace’; batching of reconciliation tasks; and more.
A Word About Architecture
There are two data stores within the WHG platform: a relational database (PostgreSQL) and a high-speed index (Elasticsearch). All uploaded data gets imported to a set of relational tables whose names correspond to the elements of Linked Places format: places, place_name, place_type, place_geom, place_link, place_when, place_related, place_description, and place_depiction. Contributed data is most readily managed in that form. Upon accessioning, records are added to the index in the manner described under Reconciliation above.
This part of the WHG platform is one of the most important, and the least developed right now. Stay tuned for further developments. Our intention is to provide access to both contributors’ individual records and datasets from the database (when designated by their owner as public), and to the aggregating index records; both with numerous and useful filtering capabilities.
Our index has been instantiated with records from modern gazetteer resources: 1) about 1,000 of the world’s most populous cites from GeoNames, 2) ~1.8 million place records from Getty TGN, 3) about 1,500 societies from the D-Place anthropological repository; and 4) major rivers, lakes, and mountain ranges from Natural Earth and Wildlife Research Institute.
To this modern “core” we have begun adding historical data: 1) 10,600 entities harvested from the index of the Atlas of World History (Dorling Kindersley, 1995), offering broad but shallow global coverage; and 2) our first specialist gazetteer, HGIS de las Indias, which consists of approximately 15,000 settlements and territories in colonial Latin America. There are several additional large datasets in the queue, which we will be adding in partnership with contributors. Some are previewed as heat maps on our Maps page.
Our Pelagios Connections
The WHG platform borrows extensively from the Peripleo application developed by Rainer Simon of the Pelagios project, extending it significantly in a few ways. Our backend architecture closely mimics that underlying both Peripleo and the Recogito annotation tool, and we are actively collaborating with Rainer and the entire Pelagios Network team on several aspects of this work. In particular, we are co-developing the data format standards for contributions to both systems: Linked Places format, and a nascent Linked Traces annotation format.
We welcome suggestions, critiques, even praise :^) – and there is an email form on the site which makes it easy to offer it. Please bear with us in this active development stage and check back as we realize the system’s potential more fully over the next several months. Look for further blog posts and follow us on Twitter; we tweet progress and related information as @WHGazetteer and @kgeographer.

No comments:

Post a Comment