Alignment Gold Standards for Ancient Greek

Saturday, December 23, 2023

Alignment Gold Standards for Ancient Greek

This repository contains guidelines and gold standard datasets for the alignment of Ancient Greek texts and translations in various languages. The guidelines are currently developed for Ancient Greek-English, Ancient Greek-Brazilian Portuguese, and Ancient Greek-Latin, and more will be added in the course of future work. Each of these guides and related gold standard was created by:

Chiara Palladino at Furman University (USA) and Farnoosh Shamsian at the University of Leipzig (Germany): Ancient Greek-English.
Chiara Palladino and David J. Wright at Furman University (USA): Ancient Greek-Latin.
Anise d'Orange Ferreira and Michel Ferreira dos Reis at UNESP Araraquara (Brazil): Ancient Greek-Portuguese.
Tariq Yousef (University of Leipzig): Gold Standard development and Inter-Annotator-Agreement (all datasets).

The guidelines were used to annotate a diverse dataset including Homeric epic, Attic prose, Platonic dialogue and the Fragmenta Historicorum Graecorum, and were tested by measuring inter-annotator agreement of 80% or higher. The Ancient Greek texts used are almost entirely available through the Scaife viewer (https://scaife.perseus.org/), while the fragments are from the DFHG Project (https://www.dfhg-project.org/).

The datasets used to develop the gold standard were aligned using the Ugarit Translation Alignment Editor for Historical languages (http://ugarit.ialigner.com/), and they are available in this repository.

The materials available here can be used to perform and evaluate alignments of various texts in Ancient Greek, to create new gold standard corpora, and to train automated translation models.

The guidelines can also be further adapted to address similar language pairs including an inflected and a synthetic language, such as Latin and English, or can provide a structure for the alignment of other historical texts against modern translations. However, the guidelines are not project-specific: they were specifically intended for the creation of a Gold Standard in the scenario of machine translation. Different scenarios, such as language research or pedagogy, may need further tweaking to these guidelines to make them more compatible with different underlying principles.

For further information on Ugarit and translation alignment of historical languages, see http://ugarit.ialigner.com/bib.php and follow us on Twitter (@ugarit_ty).

Using this repository and the Guidelines

The data in this repository is provided with a CC-BY-SA license. If you use any of the materials in this repository, please cite our work as follows.