Friday, December 11, 2015

CORPUS CORPORUM: repositorium operum Latinorum apud universitatem Turicensem

[First posted in AWOL 3 February 2014, updated 11 December 2015]

CORPUS CORPORUM: repositorium operum Latinorum apud universitatem Turicensem
http://www.mlat.uzh.ch/MLS/pictures/CC.png
The site mlat.uzh.ch is a Latin text (meta-)repository and tool under way of development. Users should take into account that some functions do not yet work satisfactorily. This Corpus Córporum is being developed at the University of Zurich under the direction of Ph. Roelli, Institute for Greek and Latin Philology. The project uses exclusively free and open software and is non-commercial. Our main goals are:
  • To provide a platform into which standardised (TEI) xml-files of Latin texts can be loaded (if you would like to share your texts, please contact us) and downloaded (unless copyrights or the texts' providers restrict this). 
  • To make these texts searchable in complex manners (including proximity search and lemmatised search). Search results, wordlists and concordances can be generated for the current text level at the bottom left of the page (we use the open-source software Sphinx). 
  • To be able to use the platform to publish Latin texts online (cf. the Richard Rufus Project's corpus). 
  • Texts may be downloaded as TEI xml or txt-files for non-commercial use (in snippets also as pdf) and can thus be reused by other researchers.
The texts are divided into corpora on a specific topic that can be searched and studied separately: the first such corpus consists of ten translations of Aristotle's Physica into Latin. They were used to study how technical Greek language could be translated into Latin. Word frequency lists are also on the server. This study was published in two papers.
Dictionaries
In order to facilitate online reading, Latin words in the text can be resolved to their lemma form by clicking them (powered by Perseus and TreeTagger with Gabriele Brandolini's Latin data [here the tag-set used], additionally also by the Comphistsem's [University of Frankfurt, led by Prof. Bernhard Jussen] wordlist). Entries in the following dictionaries are then displayed: Georges (Latin-German), Lewis and Short (Latin-English), Du Cange (mediaeval Latin), Schütz (scholastic Latin), Graesse (toponomastics); within University of Zurich's IP range also Niermeyer can be consulted. For Greek LSJ (1940) and Pape are available but do not yet work very well.
These dictionaries may be used as a search engine in your internet browser. In order to do this right click in Chromium (and Google Chrome) the URL box and choose "edit search engines". There add a new engine by filling in: Corpus Corporum / cc (or another shortcut of your choice) / http://mlat.uzh.ch/MLS/info_frame.php?w=%s In Firefox this can only be done by adding an add-on that allows users to add custom search engines.
License
The data we transcribed ourselves (including three editions by Philipp Roelli) is published under Creative Commons Share-Alike and may thus be reused freely (but non-commercially) as long as the source is indicated. Most of the texts present here, however, stem from various online sources. As far as we could determine they are either in the public domain or their use was granted us by their owners. If you believe to have rights on a text erroneously published here, please contact us and we will delete it or restrict its access. A short presentation of this project's goals and plans can be downloaded here as a pdf containing a demonstration video (11MB). An article in Archivum Latinitatis Medii Aevi (ALMA) 72 (2014) explains the project and its background more fully.
Administrator of this server is Dr. Philipp Roelli, the application code is being developed by Max Bänziger. To contact us write to turicense@gmail.com. Last updated June 2015.

No comments:

Post a Comment