Monday, October 8, 2018

Major Nomisma.org data model update: provenance

Major Nomisma.org data model update: provenance
At long last, we have implemented provenance directly within the Nomisma.org RDF data model. This is something the scientific committee has discussed for some time, and finally implemented. This was no easy task, as it meant reverse engineering the entire editing history from the Nomisma data Github repository in order to establish a chronology of creation dates and significant modifications to the content of the SKOS concepts.

The provenance is encoded primarily in the W3C Provenance Ontology. Each concept now bears a skos:changeNote that points to a dcterms:ProvenanceStatement. This ProvenanceStatement includes a prov:wasGeneratedBy activity for the date of creation and zero or more prov:activity properties that indicate subsequent modifications. Each activity has a timestamp derived from the Github commit history.

When possible, each activity also includes a prov:wasAssociatedWith property that links to a URI in the new http://nomisma.org/editor/ namespace. Any Nomisma ID created at the time of the first Github commit was presumed to have been created by Andy Meadows and/or Sebastian Heath, but it becomes complicated after this. Many IDs minted since August 2015 have been generated by a spreadsheet import mechanism. It is important to be able to link a concept to a Google spreadsheet that created or modified it. We therefore use prov:used to link to the public HTML version of the spreadsheet, and we also include some basic metadata about the spreadsheet (the URIs of the Nomisma editors that contributed to its creation, the description of the spreadsheet, etc.). Try a DESCRIBE SPARQL query for the URI, https://docs.google.com/spreadsheets/d/19N59I8u6CnwDYfSHsr10xDt50fIRp_EyqZP_BGVaH4U/pubhtml, for example.

By collating the Github commit history with all of the known spreadsheet imports, we have been able to link thousands of concepts to a few dozen spreadsheet uploads. Other groups of manually-created IDs in several categories have been attributed to known editors: Medieval and Modern German IDs to Karsten Dahmen and Walter Bloom; Byzantine rulers to Dennis Mathie. This reverse engineering of all of the Nomisma IDs took about two weeks, and further modification of the Nomisma framework codebase was undertaken to update the HTML output to display provenance, and the back-end editor and import XForms apps had to be modified to accommodate the creation and updating of provenance events.

No comments:

Post a Comment