This repository contains Ancient Greek texts which have been tokenized, POS-tagged, sentence-splitted, and lemmatized automatically. The texts come from the following repositories, which currently contain most of the Ancient Greek texts freely accessible over the internet:
As for the tokenization, POS tagging and sentence splitting, the data rely on those provided in:
- https://github.com/PerseusDL/canonical-greekLit/releases/tag/0.0.236
- https://github.com/OpenGreekAndLatin/First1KGreek/releases/tag/1.1.1802
Refer to these repositories for further documentation. In the present repository, the POS tag + the word form of a token have been automatically linked to those contained in Morpheus and MorpheusUnderPhilologic. Since the latter databases also contain lemmata, this allowed their automatic extraction.
Wednesday, August 9, 2017
Lemmatized Ancient Greek Texts
Lemmatized Ancient Greek Texts
No comments:
Post a Comment