Monday, September 19, 2016

A Latin Macronizer

[First posted in AWOL 20 July 2015, updated (new URL) 19 September 2016]

A Latin Macronizer
This automatic macronizer lets you quickly mark all the long vowels in a Latin text. The expected accuracy on an average classical text is estimated to be about 98% to 99%. Please review the resulting macrons with a critical eye!

The macronization is performed using a part-of-speech tagger (RFTagger) trained on the Latin Dependency Treebank, and with macrons provided by a customized version of the Morpheus morphological analyzer. An earlier version of this tool was the subject of my bachelor’s thesis in Language Technology, Automatic annotation of Latin vowel length.

If you want to run the macronizer locally, or develop it further, you may find the source code on GitHub.

Copyright 2015, 2016 Johan Winge. Please send comments to johan.winge@gmail.com.

No comments:

Post a Comment