Friday, December 2, 2016

Open Greek and Latin Project of the Open Philology Project

Open Greek and Latin Project of the Open Philology Project
The ultimate goal is to represent every source text produced in Classical Greek or Latin from antiquity through the present, including texts preserved in manuscript tradition as well as on inscriptions, papyri, ostraca and other written artifacts.  Over the course of the next five years, we will focus upon converting as much Greek and Latin, available as scanned printed books, into an open, dynamic corpus, continuously augmented and improved by a combination of automated processes and human contributions of many kinds. The focus upon Greek and Latin reflects both the belief that we have an obligation to disseminate European cultural heritage and the observation that recent advances in OCR technology for Greek and Latin make these intertwined languages ready for large-scale work.

The Open Greek and Latin Project aims at providing at least one version for all Greek and Latin sources produced during antiquity (through c. 600 CE) and a growing collection from the vast body of post-classical Greek and Latin that still survives. Perhaps 150 million words of Greek and Latin, preserved in manuscripts, on stone, on papyrus or other writing surface, survive from antiquity. Analysis of 10,000 books in Latin, downloaded from Archive.org, identified more than 200 million words of post-classical Latin. With 70,000 public domain books listed in the Hathi Trust as being in Ancient Greek or Latin, the amount of Greek and Latin already available will almost certainly exceed 1 billion words.

Where existing corpora of Greek and Latin have generally included one edition of a work, Open Greek and Latin Corpus is designed to manage multiple versions of, and to represent the complete textual history of, a work: every manuscript, every papyrus fragment, and every printed edition are all versions within the history of a text. In the short run, this involves using OCR-technology optimized for Classical Greek and Latin to create an open corpus that is reasonably comprehensive for the c. 100 million words produced through c. 600 CE and that begins to make available the billions of words produced after 600 CE in Greek and Latin that survive.

Open Greek & Latin Texts

A collection of machine-corrected XML versions of classical authors and works, freely available to download and reuse. For more information, click on the tabs below. Texts are published in GitHub on an ongoing basis. Watch this space and our Facebook page for updates.

athenaeus-dev

The works of Athenaeus of Naucratis, Greek rhetorician and grammarian.

church_fathers_dev

A selection of Church Fathers.

english_trans-dev

English translations of classical works.

misc-dev

An undefined collection of TEI and EpiDoc versions of classical texts.

cag-dev

The Commentaria in Aristotelem Graeca.

csel-dev

The Corpus Scriptorum Ecclesiasticorum Latinorum.

fragmentary-dev

A Collection of classical fragmentary authors and works.

italian_trans-dev

Italian translations of classical works.

patrologia_latina-dev

The Patrologia Latina.

catenae-dev

The Catenae Graecorum Patrum in Novum Testamentum.

dfhg-dev

The Fragmenta Historicorum Graecorum.

french_trans-dev

French translations of classical works.

libanius-dev

The works of the Greek rhetorician Libanius.

philo-dev

The works of Philo Judaeus, the Hellenistic Jewish philosopher.

No comments:

Post a Comment