Sunday, September 25, 2016

Archive of the XML files of the Mannheim / Heidelberg CAMENA Neo-Latin project

Archive of the XML files of the Mannheim / Heidelberg CAMENA Neo-Latin project

CAMENA - Latin Texts of Early Modern Europe: the XML files


CAMENA (Corpus Automatum Multiplex Electorum Neolatinitatis Auctorum), a DFG-funded research project carried out at the German Department of Heidelberg University Chair of German Literature (Modern Period), in cooperation with the Information Technology Center and the Library of the University of Mannheim, and led by Prof. Dr. Wilhelm Kühlmann, was active from 1999 to 2013; we particularly thank the spiritus movens of Wolfgang Schibel, as well as Reinhard Gruhl, Emir Zuljevic, Heinz Kredel, and other members of the team.

In our opinion, CAMENA was one of the most important Neo-Latin digital initiatives. Since its machine-readable texts were made available under the Creative Commons Attribution / Share Alike license, here we are republishing the XML files of all the CAMENA collections as a Github repository, with all the caveats of the original project regarding citing and reliability, and with the intent to enable further digital experiments with CAMENA Neo-Latin material.

Again, sincere gratitude goes to colleagues involved in CAMENA for all their efforts, and for making this possible. Sumus nani gigantum humeris insidentes.


In CAMENA, the texts are divided in five collections: POEMATA, Neo-Latin poetry composed by German authors; HISTORICA & POLITICA, Latin historical and political writing; THESAURUS ERUDITIONIS, a reference collection of dictionaries and handbooks of the period 1500-1750; CERA, printed Latin letters, mostly by German scholars, from the period 1530-1770; and ITALI, works by Italian Renaissance humanists born before 1500. The collection ITALI has no XML files, so it was not included in this repository.

We were not able to find information on the exact number of XML files produced by CAMENA. This repository contains 949 XML files in the POEMATA section, 382 files in the HISTORICA & POLITICA, 296 files in the THESAURUS ERUDITIONIS, and 124 files in CERA, with the total of 1751 files. These files contain 50,458,045 words (tokens) below the text element (more on this in Word count).

Not all CAMENA XML files provide full text of the digitized source. For example, the file Arenhold_conspectus_index_II.xml in CERA offers only the table of contents to the digitized volume of Arenhold, Silvester Johannes: Conspectus Bibliothecae Universalis Historico-Literario-Criticae Epistolarum : Typis Expressarum Et M[anu]S[crip]tarum, Illustrium Omnis Aevi Et Eruditissimorum Auctorum. - Hanoverae : Sumptibus Hereduum [!] Foersterianorum, 1746. In the CAMENA-CERA version, the table of contents contains links to respective page images of the digitized book. We did not try to exclude such partial XML publications from this repository.

No comments:

Post a Comment