Wednesday, May 11, 2016

The Digital Hill Project

by Marcel Mernitz
This is a quick overview about the Digital Hill project, which is part of the Open Greek and Latin project at the Humboldt Chair of Digital Humanities at the University of Leipzig. When I started working on the project, the first step was to create a spreadsheet that gathered all sources mentioned by G.F. Hill (Sources for Greek History between the Persian and Peloponnesian Wars, Oxford 1897) in his third chapter about the “Revolt of Samos”. The spreadsheet contains further information about each source, e.g. if an XML file already exists in one of our repositories and a link to it or a link to the new created XML file. Furthermore, any text left out by Hill has been stored in a separate column and the spreadsheet provides links to the treebanking and text alignment files I created for the project. The spreadsheet can be accessed via the following link: https://goo.gl/zEcevt
There is a legend in column M that explains the coloured cells.

As part of the project we have created a new repository on GitHub where all the XML and EpiDoc files of the project are stored. In the GitHubo repo it is possible to find the treebank and text alignment data and also the data for the web page. The link for this repository is: https://github.com/DigitalHill

Speaking of the webpage, it is accessible online at http://digitalhill.github.io/

The results can be found in “Chapter III” –> “Revolt of Samos”. There are two subchapters that can be unfolded by clicking on them, “bilingual alignments” and “ancient alignments”. The first is divided in several tables that can be opened by the user, providing mixed results from treebanking and text aligning. When clicking on one of the pen-buttons, the aligned words (all, only verbs, only nouns; according to the button clicked) switch colours according to Arethusa and when hovered over an aligned word, it becomes highlighted as well as the aligned words as in Alpheios. There are Greek/Latin – English and Greek – German alignments to show the possibility of using languages other than English.


The ancient alignments contain Greek – Greek and Greek – Latin alignments. The aligned words switch colours when the pen-button is clicked, similar to the bilingual alignments. Of course the aligned words are highlighted when hovered over. I also added additional information to the ancient text alignments that the user can make visible by hovering over the red attention sign (see Figure 1). The web page is working with jQuery scripts I wrote.

DigitalHillBlog1
Figure 1: Detail of the web page with additional information.

A paper is going to be published in the “Digital Classics Online” journal in which all the results are going to be presented.

Working on the treebanking and text alignment files I was bothered by brackets or exiting backslashes that sometimes I had to correct manually in the XML file. For that reason I wrote a little program based on a web crawler I found teaching myself Python3 (I had to edit the code on several occasions and add a new method), that cleans up texts automatically. The second screenshot shows the GUI I created for this program (it is by any means not very pretty though and is missing menus) with a short sentence typed in several times on the left side and the result on the right side (see Figure 2). There is a method for statistics that shows the words of the text and how often they occurred in it, but this method is not yet implemented in the GUI. There are still issues in the DH community about how ancient Greek characters and diacritics should be encoded and displayed. As soon as they will be (at least partially) solved, it will be quite easy to change the algorithm.

DigitalHillBlog2
Figure 2: The transformation / clean-up tool GUI.

Currently the GUI is only stored on my laptop and is not available online.
I am still trying to figure out how to use XSLT 2.0 for the inscriptions mentioned by Hill to implement them in the webpage as well.

No comments:

Post a Comment