Tuesday, October 1, 2019

Coptic Scriptorium News: Fall 2019 Corpora Release 3.0.0

Coptic Scriptorium News: Fall 2019 Corpora Release 3.0.0
Coptic Scriptorium is happy to announce our latest data release, including a variety of new sources thanks to our collaborators (digitized data courtesy of the Marcion and PAThs projects!). New in this release are:
  • Saints’ lives
    • Life of Cyrus
    • Life of Onnophrius
    • Lives of Longinus and Lucius
    • Martyrdom of Victor the General (part 2)
  •  Miscellaneous:
    • Dormition of John
    • Homilies of Proclus
    • Letter of Pseudo-Ephrem
We are also releasing expansions to some of our existing corpora, including:
  • Canons of Johannes (new material annotated by Elizabeth Platte and Caroline T. Schroeder, digital edition provided by Diliana Atanassova)
  • Some Kinds of People Sift Dirt
  • Apophthegmata Patrum
  • A large number of corrections to most of our existing corpora, which are being republished in this release.
All documents have metadata for word segmentation, tagging, and parsing to indicate whether those annotations are machine annotations only (automatic), checked for accuracy by an expert in Coptic (checked), or closely reviewed for accuracy, usually as a result of manual parsing (gold).
You can search all corpora using ANNIS and download the data in 4 formats (relANNIS database files, PAULA XML files, TEI XML files, and SGML files in Tree-tagger format): browse on GitHub. If you just want to read works, cite project data or browse metadata, you can use our updated repository browser, the Canonical Text Services browser and URN resolver:
Our total annotated corpora are now at over 850,000 words; corpora that have human editors who reviewed the machine annotations are now over 150,000!
We would like to thank Marcion, PAThs and the National Endowment for the Humanities for supporting us – we hope this release will be useful and are already working on more!

No comments:

Post a Comment