It is our pleasure to announce the latest data release from Coptic Scriptorium, version 4.2.0. This release contains both new Coptic material and additions to older datasets, as well as expanding our entity annotations and named-entity linking to all of our data, including the semi-automatically annotated Old Testament. The also means automatic updates to all of our interfaces, such as the recently added example usage functionality in the Coptic Dictionary Online, which is linked to the corpora.
The new material, including more digitized data courtesy of the Marcion project, as well as manually digitized and corrected OCR data from out of print editions includes:
- Encomium of Pseudo-Celestinus on Victor (annotations by Mitchell Abrams and Lance Martin)
- Encomium of Pseudo-Flavianus on Demetrius, Archbishop of Alexandria (annotations by Mitchell Abrams, Lance Martin and Amir Zeldes)
- Added works by Shenoute of Atripe:
- In the Night (Canons 9, annotations by Lance Martin, Caroline T. Schroeder and Amir Zeldes)
- Because of You Too O Prince of Evil (Discourses 4, annotations by Tamara Siuda, Lance Martin and Caroline T. Schroeder)
- Expansions and improvements of existing corpora:
- More Apophthegmata Patrum (work by Christine Luckritz Marquis, So Miyagawa, Caroline T. Schroeder and Amir Zeldes)
- Further material from Shenoute’s works:
- God Says Through Those Who Are His (including parallel witnesses and new material, data courtesy of David Brakke, annotations by Rebecca Krawiec, Lance Martin, Dana Robinson, Caroline T. Schroeder)
- Acephalous Work 22 (data courtesy of David Brakke, annotations by Elizabeth Davidson, Rebecca Krawiec, Elizabeth Platte, Caroline T. Schroeder, Amir Zeldes)
- More syntactically annotated gold treebanked data in the Coptic Treebank
- Completely re-annotated Old Testament corpus, based on the base text courtesy of the Digital Edition of the Coptic Old Testament (CoptOT) project – with improved segmentation and parsing, now complete with semi-automatic entity recognition and linking to Wikipedia entries for people and places
With this new release, the semi-automatically annotated data (excluding automatically processed Bible materials) in the project covers close to 300,000 words of Sahidic Coptic annotated for entities.
This release represents a tremendous amount of work over the past few months by the Coptic Scriptorium team. We would also like to thank individual contributors (which you can always find in the ‘annotation’ metadata for each document), and specifically So Miyagawa for help with Coptic OCR models, as well as the Marcion and CoptOT project for sharing their data with us, and the National Endowment for the Humanities for supporting us. We are continuing to work on more data, links to other resources and new kinds of annotations and tools. Please let us know if you have any feedback!
No comments:
Post a Comment