At long last we are ready to offer a v0.2 beta release of the World Historical Gazetteer (WHG) at http://dev.whgazetteer.org.
We hope that spatial historians and spatio-temporal infrastructure
developers will be interested in taking a look at what we are building,
experimenting with their data or provided samples. It is a “sandbox,” so
nothing will be saved for the time being (that will change soon). There
are 5-6 months remaining in the term of our initial NEH grant, time
enough to complete most of what we planned for this phase, and to
incorporate more suggestions from users and potential contributors as we
move toward future planning and development.
The site includes a brief guide titled “WHG Beta Release: A Tour,”
which outlines what is there, what you can do and how, remaining
challenges, and what is in the works. What follows is a higher level
introduction.
Places and Traces
The World Historical Gazetteer is a Linked Open Data platform for
publishing, linking, discovering, and visualizing contributed records of
attested historical places and traces.
Our initial focus has been on places, but we are working experimentally
to demonstrate their integration within the platform with what we now
call traces–defined as web resources about historical entities
for which location in time and space is of scholarly and general
interest. We are considering three classes of traces for the time being:
agents (people and groups), works (e.g. artifacts, texts, datasets), and events
(e.g. journeys, conflict). Our objective has been to create the first
large-scale spatial infrastructure for world history: oriented toward
documenting the human past at the global scale, and particularly the
geography of global and transregional connections.
Our accessioning process is intended to eventually be largely
self-directed; getting it to that stage means working directly and
hands-on with our early contributors.
LOD Publication
Registered users of WHG can publish their place records as Linked Open Data simply by uploading them in Linked Places format (or the LP-TSV version
intended for relatively simpler records). We see LOD publication as a
key feature for researchers who are not in a position to stand up their
own web interfaces with per-place pages. Once uploaded, each record will
have a permanent URI and be accessible in our graphical interface and
API; on their way to being LOD in good standing. The dataset can be
browsed immediately by its owner in a searchable table and map, but
turning the uploaded dataset into a contribution for accessioning
requires some further steps. The data needs to have as many asserted
links to name authorities as possible, and augmentations of geometry
where that is missing and findable. We provide reconciliation services
for that purpose.
Reconciliation
Simply put, reconciliation is the process of identifying matches
between records of named entities. In this case the records are for
places, and the matches are between a researcher’s records and those in
existing place name authorities. So far, we provide reconciliation
services for the Getty Thesaurus of Geographic Names (TGN) and Wikidata; DBpedia and GeoNames are planned.
The reconciliation process has two steps: 1) sending records to the
authority, and 2) reviewing the prospective matches returned and
accepting or declining them as appropriate. The results of this somewhat
laborious process are 1) links, and 2) more geometry. Once augmented in
this way, a dataset is ready for accessioning.
Accessioning
The last step is another reconciliation effort — this time to the WHG
index. Each record is compared to the growing WHG index to determine if
we have a contributed attestation for the place yet or not. If we do,
the incoming record becomes a “child” or “leaf” in the set of
attestations for the place. If the place is not yet accounted for, the
new record becomes a “parent” — the seed for a new set of attestations.
At this stage, an automatic linking can be made if two records share an
authority match, but the rest will have to be reviewed as described
above.
Graphical Interface
The opening screen of WHG offers users search of places and traces.
We try to offer enough context on the opening screen to identify the
likeliest match. Once you identify a place of interest, clicking its
name take you to a “place portal” screen–where everything we have about
the place, or linked to it in some way, will appear: attestations from
contributors, associated traces, nearby places, physical geographic
context (rivers, watersheds, ecoregions). The place portal is very much a
work-in-progress at this stage. Several other features are also on our
near-term to do list, including advanced search; more and better maps;
user data collections; project team ‘workspace’; batching of
reconciliation tasks; and more.
A Word About Architecture
There are two data stores within the WHG platform: a relational
database (PostgreSQL) and a high-speed index (Elasticsearch). All
uploaded data gets imported to a set of relational tables whose names
correspond to the elements of Linked Places format: places, place_name, place_type, place_geom, place_link, place_when, place_related, place_description, and place_depiction.
Contributed data is most readily managed in that form. Upon
accessioning, records are added to the index in the manner described
under Reconciliation above.
An API
This part of the WHG platform is one of the most important, and the
least developed right now. Stay tuned for further developments. Our
intention is to provide access to both contributors’ individual records
and datasets from the database (when designated by their owner as
public), and to the aggregating index records; both with numerous and
useful filtering capabilities.
Content
Our index has been instantiated with records from modern gazetteer
resources: 1) about 1,000 of the world’s most populous cites from
GeoNames, 2) ~1.8 million place records from Getty TGN, 3) about 1,500
societies from the D-Place anthropological repository; and 4) major rivers, lakes, and mountain ranges from Natural Earth and Wildlife Research Institute.
To this modern “core” we have begun adding historical data: 1) 10,600 entities harvested from the index of the Atlas of World History (Dorling Kindersley, 1995), offering broad but shallow global coverage; and 2) our first specialist gazetteer, HGIS de las Indias,
which consists of approximately 15,000 settlements and territories in
colonial Latin America. There are several additional large datasets in
the queue, which we will be adding in partnership with contributors.
Some are previewed as heat maps on our Maps page. Our Pelagios Connections
The WHG platform borrows extensively from the Peripleo application developed by Rainer Simon of the Pelagios project, extending it significantly in a few ways. Our backend architecture closely mimics that underlying both Peripleo and the Recogito
annotation tool, and we are actively collaborating with Rainer and the
entire Pelagios Network team on several aspects of this work. In
particular, we are co-developing the data format standards for
contributions to both systems: Linked Places format, and a nascent Linked Traces annotation format.
Feedback
We welcome suggestions, critiques, even praise :^) – and there is an
email form on the site which makes it easy to offer it. Please bear with
us in this active development stage and check back as we realize the
system’s potential more fully over the next several months. Look for
further blog posts and follow us on Twitter; we tweet progress and
related information as @WHGazetteer and @kgeographer.
No comments:
Post a Comment