Trismegistos Words
Short Introduction
Trismegistos Words is a new addition to the Trismegistos
universe. It allows searching for lemmata and their declined or
conjugated forms in Greek papyrological texts. It is a spin-off of
doctoral research of Alek Keersmaekers on the Greek complementation
system. It is currently beta at best, but we
plan to improve and expand it, perhaps also with syntactic annotations.
Please contact us to point out errors or cooperate in other ways. Note
that we are in close contact with the people of the Papyrological
Navigator and hope to maximize interoperability with this tool, whose
text remains canonical.
Coverage & Accuracy
The starting point of Trismegistos Words was the XML of the texts as it was available in the Papyrological Navigator (PN; papyri.info)
in September 2016. The 4,513,494 words in these ca. 60,000 texts were
tokenized and part-of-speech/morphology and lemma information was added.
On the basis of this XML two MySQL databases were created, one with the
attestations of the words, and one based on that with 18,783
reconstructed lemmata (names excepted). Of these, 14,013 that occur more
than once or have a translation have been selected for the online
version. The reason for this is that the part-of-speech/morphology
tagging performed by a trained algorythm is only about 95%, which may
seem high, but still implies that there are several thousands of
mistakes in the current version - especially for damaged words and
sections.
How to cite
In contrast with almost all other databases in Trismegistos, we
currently do not have any numeric stable identifiers for the lemmata,
the attestations of the words, or the morphological analysis. Part of
the reason for this is that we are still reflecting about the best way
to deal with changes in the underlying text - which is connected to the
synergy with the Papyrological Navigator.
No comments:
Post a Comment