Automatic linguistic analysis and Entity Linking from I Samuel 25
It is our pleasure to announce the latest data release from Coptic
Scriptorium, version 4.2.0. This release contains both new Coptic
material and additions to older datasets, as well as expanding our
entity annotations and named-entity linking to all of our data,
including the semi-automatically annotated Old Testament. The also means
automatic updates to all of our interfaces, such as the recently added
example usage functionality in the Coptic Dictionary Online, which is linked to the corpora.
The new material, including more digitized data courtesy of the Marcion project, as well as manually digitized and corrected OCR data from out of print editions includes:
More Apophthegmata Patrum (work by Christine Luckritz Marquis, So Miyagawa, Caroline T. Schroeder and Amir Zeldes)
Further material from Shenoute’s works:
God Says Through Those Who Are His (including
parallel witnesses and new material, data courtesy of David Brakke,
annotations by Rebecca Krawiec, Lance Martin, Dana Robinson, Caroline T.
Schroeder)
Acephalous Work 22 (data
courtesy of David Brakke, annotations by Elizabeth Davidson, Rebecca
Krawiec, Elizabeth Platte, Caroline T. Schroeder, Amir Zeldes)
More syntactically annotated gold treebanked data in the Coptic Treebank
Completely re-annotated Old Testament corpus, based on the base text courtesy of the Digital Edition of the Coptic Old Testament
(CoptOT) project – with improved segmentation and parsing, now complete
with semi-automatic entity recognition and linking to Wikipedia entries
for people and places
With this new release, the semi-automatically annotated data
(excluding automatically processed Bible materials) in the project
covers close to 300,000 words of Sahidic Coptic annotated for entities.
This release represents a tremendous amount of work over the past few
months by the Coptic Scriptorium team. We would also like to thank
individual contributors (which you can always find in the ‘annotation’
metadata for each document), and specifically So Miyagawa for help with
Coptic OCR models, as well as the Marcion and CoptOT project for sharing
their data with us, and the National Endowment for the Humanities for
supporting us. We are continuing to work on more data, links to other
resources and new kinds of annotations and tools. Please let us know if
you have any feedback!
No comments:
Post a Comment