The page you are currently viewing is a web interface for the pilot version of Amharic corpus. The corpus size so far is about 23 millions tokens. The texts of the corpus have been automatically annotated with a part of speech analyzer. There is a disambiguation in the corpus, i. e. each token is annotated with one appropriate analyse.
Most tokens in the current version of the corpus belong to news texts. The rest of the texts include blogs and nonfiction (Wikipedia articles and essays). Eventually we intend to increase the number and diversity of texts and add fiction texts to the corpus.
The latest updateMay 25th, 2016.
Created byMaria Obedkova under guidance of Boris Orekhov within the project of HSE School of Linguistics
Web interfaceThe search platform of the Eastern Armenian National Corpus (EANC) was used for this corpus. You can read about making search queries at EANC help page.
Sunday, March 29, 2020
Amharic Corpus
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment