Sunday, May 15, 2022

CORPUS CORPORUM: repositorium operum Latinorum apud universitatem Turicensem

[First posted in AWOL 3 February 2014, updated 15 May 2022]

CORPUS CORPORUM: repositorium operum Latinorum apud universitatem Turicensem

The site mlat.uzh.ch is a Latin text (meta-)repository and tool under way of development. Users should take into account that some functions do not yet work satisfactorily. This Corpus Córporum is being developed at the University of Zurich under the direction of Ph. Roelli, Institute for Greek and Latin Philology. The project uses exclusively free and open software and is non-commercial. Our main goals are:

  • To provide a platform into which standardised (TEI) xml-files of Latin texts can be loaded (if you would like to share your texts, please contact us) and downloaded (unless copyrights or the texts' providers restrict this).
  • To make these texts searchable in complex manners (including proximity search and lemmatised search). Search results, wordlists and concordances can be generated for the current text level at the bottom left of the page (we use the open-source software Sphinx).
  • To be able to use the platform to publish Latin texts online (cf. the Richard Rufus Project's corpus).
  • Texts may be downloaded as TEI xml or txt-files for non-commercial use (in snippets also as pdf) and can thus be reused by other researchers.
  • Note that the XML files come from various sources and, although they should approach the TEI xml standard, some of them do not validate as TEI because of minor differences from the rigid TEI guide-lines.

The texts are divided into corpora on a specific topic that can be searched and studied separately: the first such corpus consists of ten translations of Aristotle's Physica into Latin. They were used to study how technical Greek language could be translated into Latin. Word frequency lists are also on the server. This study was published in two papers.

What's new?

22.12.2021: Launch of a completely new version of Corpus Corporum. Some features (such as the synoptic Bible or POS-searches) are not yet implemented. We are happy to receive bug reports and feedback from 1.2.2022 onward (turicense@gmail.com).
22.12.2021: The Mirabile Corpus was added (our special thanks to SISMEL Firenze!). The Corpus Corporum now contains over 170 M words.
22.3.2019: reached 160 million words.
1.3.2018: we have reached 150 million words. Several new corpora are in preparation.
21.6.2017: new dictionary added from our Czech colleagues: Latinitatis medii aevi lexicon Bohemorum (www.ics.cas.cz/en), thanks to Pavel Nývlt!
19.10.2016: new part of speech dependent search option, for details cf. HELP (left bottom frame).
10.5.2016: new dictionary added: Gaffiot, Dictionnaire latin-français. Thanks to G. Gréco, M. De Wilde, B. Maréchal, K. Ôkubo!
2.5.2016: links to author pages in Mirabile (SISMEL Firenze) added.
2.4.2016: author data and links to VIAF, DNB and Wiki added.
4.2.2016: new feature: synoptic Bible in Hebrew, Greek, Latin and English.

No comments:

Post a Comment