Amharic Corpus
[First posted in AWOL 4 October 2018, updates 29 March 2020]
Amharic Corpus
The page you are currently viewing is a web
interface for the pilot version of Amharic corpus. The corpus size so
far is about 23 millions tokens. The texts of the corpus have been
automatically annotated with a part of speech analyzer. There is a disambiguation in the corpus, i. e. each token is annotated with one appropriate analyse.
Most tokens in the current version of the
corpus belong to news texts. The rest of the texts include blogs and
nonfiction (Wikipedia articles and essays). Eventually we intend to
increase the number and diversity of texts and add fiction texts to the
corpus.
May 25th, 2016.
Maria Obedkova under guidance of Boris Orekhov within the project of HSE School of Linguistics
The search platform of the Eastern Armenian National Corpus (EANC) was used for this corpus. You can read about making search queries at EANC help page.
No comments:
Post a Comment