Saturday, March 8, 2014

HdtDep: Syntactic dependencies search engine in Herodotus' Book 1

 [First posted in AWOL 22 June 2011. Updated 8 March 2014]

HdtDep is a search engine for a treebank consisting of the first book of Herodotus' Histories. The treebank is encoded in an XML file based on A. Godley's Loeb edition (1920), available under a Creative Commons Attribution-ShareAlike 3.0 United States License on the Perseus Project website. All typos have been corrected. 

The Greek characters are encoded in the UTF-8 Unicode format. The XML files is structured in <chapter> and <sentence> node, which contain <word> nodes. All punctuation was removed. Since the UTF-8 format encodes graphemes with different diacritics as distinct glyphs, all grave accents have been turned into acutes (in order to improve the searchability). Enclisis accents have been removed. All elided vowels have been restored. Moreover, all crasis forms have been resolved into uncontracted words, in order to correctly represent their syntactic relationship.

The syntactic structure of the sentences has been described by applying an adapted version of Igor Mel'čuk's dependency theory (Mel'čuk 1988: Dependency Syntax: Theory and Practice, Albany; Mel'čuk 2009: 'Dependency in Natural Language', in A. Polguère & I. Mel'čuk, Dependency in Linguistic Description, Amsterdam - Philadelphia: 1-110; Vatri 2011: Syntactic dependencies in Classical Greek [submitted]). Each word is annotated with the element it depends on and its grammatical category/sub-category (see below). Nouns, adjectives and verbs also contain the Attic lexical entry under which they appear in LSJ. The syntactic relationship types, whose interpretation is highly theory-dependent, has not been encoded.

