A Full Morphosyntactic Annotation of the State Archives of Assyria Letter Corpus

Creators Ong, Matthew

The dataset consists of a full morphosyntactic annotation of the normalized letter corpus of the State Archives of Assyria online (SAAo), plus associated metadata regarding sender, recipient, estimated date of composition, script, and dialect of Akkadian (if determinable). This corpus comprises ten of the twenty-one current volumes of SAAo and contains approximately 2600 letters from the royal archives of the late Neo-Assyrian kings. Each letter features morphosyntactic annotations specifying part of speech, lemma, morphological decomposition, and syntactic dependencies of all relevant tokens in the text. The annotations were made with the help of a spaCy language model with additional human checking and completion. The annotations are available both as a set of CONLLU files (one per text) and as linked open data in a single TTL file. The associated metadata is available as a CSV file and a TTL. Due to the letters' shared format, topics of concern, and historical period in which they were written, this corpus forms a natural object of study from a linguistic and social historical perspective. It is hoped this data will be of use to researchers wishing to do linguistic and sociolinguistic corpus research on these texts. 


