Tuesday, January 9, 2018

New trom Trismegistos: Trismegistos Words

Trismegistos Words

Short Introduction

Trismegistos Words is a new addition to the Trismegistos universe. It allows searching for lemmata and their declined or conjugated forms in Greek papyrological texts. It is a spin-off of doctoral research of Alek Keersmaekers on the Greek complementation system. It is currently beta at best, but we plan to improve and expand it, perhaps also with syntactic annotations. Please contact us to point out errors or cooperate in other ways. Note that we are in close contact with the people of the Papyrological Navigator and hope to maximize interoperability with this tool, whose text remains canonical.

Coverage & Accuracy

The starting point of Trismegistos Words was the XML of the texts as it was available in the Papyrological Navigator (PN; papyri.info) in September 2016. The 4,513,494 words in these ca. 60,000 texts were tokenized and part-of-speech/morphology and lemma information was added. On the basis of this XML two MySQL databases were created, one with the attestations of the words, and one based on that with 18,783 reconstructed lemmata (names excepted). Of these, 14,013 that occur more than once or have a translation have been selected for the online version. The reason for this is that the part-of-speech/morphology tagging performed by a trained algorythm is only about 95%, which may seem high, but still implies that there are several thousands of mistakes in the current version - especially for damaged words and sections.

How to cite

In contrast with almost all other databases in Trismegistos, we currently do not have any numeric stable identifiers for the lemmata, the attestations of the words, or the morphological analysis. Part of the reason for this is that we are still reflecting about the best way to deal with changes in the underlying text - which is connected to the synergy with the Papyrological Navigator.

No comments:

Post a Comment