Monday, February 3, 2014

CORPUS CORPORUM: repositorium operum Latinorum apud universitatem Turicensem

CORPUS CORPORUM: repositorium operum Latinorum apud universitatem Turicensem
http://www.mlat.uzh.ch/MLS/pictures/CC.png

The site mlat.uzh.ch is a Latin text (meta-)repository and tool under way of development. Users should take into account that many functions do not yet work satisfactorily. This Corpus Córporum is being developed at the University of Zurich under the direction of Ph. Roelli, Institute of Medieval Latin Studies. The project uses exclusively free and open software and is non-commercial. Our main goals are:

- To provide a platform into which standardised (TEI) xml-files of Latin texts can be loaded (if you would like to share your texts, please contact us) and downloaded (unless copyrights restrict this).
- To present these texts in a way that they may be read online. Latin words in the text can be resolved to their grammatical form by clicking them (powered by Perseus and TreeTagger). The entries in the following dictionaries are then displayed: Georges (Latin-German), Lewis and Short (Latin-English) and DuCange (mediaeval Latin).
- To make these texts searchable in complex manners (including by lemma). Searches, wordlists and concordances can be generated for the current level at the bottom left of the page (we use the open-source software Sphinx).
- To be able to use the platform to publish Latin texts online (cf. the Richard Rufus Project's corpus).
- Texts may be downloaded as TEI xml or txt-files for non-commercial use (soon also as pdf).

The texts are divided into corpora on a specific topic that can be searched and studied separately: the first such corpus consists of ten translations of Aristotle's Physica into Latin. They were used to study how technical Greek language could be translated into Latin. Word frequency lists are also on the server. This study was published in two papers.

Thanks to a grant we were able to intensify work on this project in 2013. The following steps are under way between 2013 and 2015: (i) Adaptation and installation of TreeTagger in order to automatically tag our texts syntactically and to lemmatise them. We use Gabriele Brandolini's data-set for Latin; it is largely based on St. Thomas Aquinas (will be soon online). (ii) Formatting and loading large amounts of texts in order to be able to test for our software. (iii) Time-delimited searches and chronological search result lists. (iv) Further we are in contact with those Mediaeval Latin Dictionaries present in COST action IS1005. They are currently working on a wiki-powered cross search tool which would be linked to this site in the future.

The data we transcribed ourselves is published under Creative Commons Share-Alike and may thus be reused freely (but non-commercially) as long as the source is indicated. Most of the text present here, however, stem from various online sources. As far as we could determine they are either in the public domain or their use was granted us by their owners. If you believe to have rights on a text erroneously published here, please contact us and we will delete it or restrict its access. A short presentation of this project's goals and plans can be downloaded here as a pdf containing a demonstration video (11MB).

Administrator of this server is Dr. Philipp Roelli, the application code is being developed by Max Bänziger. To contact us write to turicense@gmail.com. Last updated Jan. 2014.

No comments:

Post a Comment