WORDHOARD: An application for the close reading and scholarly analysis of deeply tagged texts

[First posted in AWOL 4 March 2010, updated 10 October 2014]

WORDHOARD: An application for the close reading and scholarly analysis of deeply tagged texts

The WordHoard project is named after an Old English phrase for the verbal treasure 'unlocked' by a wise speaker. It applies to highly canonical literary texts the insights and techniques of corpus linguistics, that is to say, the empirical and computer-assisted study of large bodies of written texts or transcribed speech. In the WordHoard environment, such texts are annotated or tagged by morphological, lexical, prosodic, and narratological criteria. They are mediated through a 'digital page' or user interface that lets scholarly but non-technical users explore the greatly increased query potential of textual data kept in such a form.

It is a basic assumption of WordHoard that new kinds of historical, literary, or broadly cultural analysis will be supported through the forms of data access that are made possible when literary texts are treated in the manner of linguistic corpora. Deeply tagged corpora of course support more finely grained inquiries at a verbal or stylistic level. But more importantly, access to the words of a text at such microscopic levels also lets you look in new ways at the imaginative worlds created by those words.

In its current release WordHoard contains the entire canon of Early Greek epic in the original and in translation, as well as all of Chaucer, Shakespeare, and Spenser. The section on Provenance, Copyrights, and Licenses provides detailed information about the texts.

Preface

Understanding WordHoard

What is WordHoard?

Metadata and the Query Potential of the Digital Surrogate

Working with Very Common and Very Rare Words

The Corpora and Tagging Data

A Hands-On Introduction to WordHoard

Getting Started

The Basics

Reading

The Table of Contents Window

Getting Information about Works

Displaying and Reading Works

Getting Information about Words

Lexicons

Getting Information about Lemmas

Parts of Speech and Word Classes

Translations, Transliterations and Transcriptions

The Iliad Scholia and E. K. Annotations

Searching

Searching for Words

Concordances

Searching for Lemmas

Searching for Works

Accounts

Logging In and Logging Out

Managing Accounts

Saved Queries

Introduction to Saved Queries

Saved Word Queries

Saved Bibliographic Queries

Work and Word Sets

Introduction to Work and Word Sets

Work Sets

Word Sets

Exporting and Importing Sets

Statistical Analysis

Introduction to Analysis Methods

Calculator Window

Displaying Word Form Lists

Comparing Word Form Counts

Tracking Word Form Use Over Time

Finding Collocates

Finding Multiword Units

Comparing Collocates

Comparing Texts

Annotations

Annotations

Scripting

Introduction to scripting

Mathematical functions

Utility functions

Script example: How many words are unique to each Shakespeare work?

Notes for Developers

Introduction for Developers

Files and Setup

Documents

Installing the Database

Hibernate changes

Building the Source Code

Building the Static Object Model

Deployment

Adding New Texts

Introduction for Text Developers

The Corpora XML File

The Authors XML File

The Word Classes XML File

Parts of Speech XML File

Work XML Files

Standard Spelling XML Files

The Benson Gloss XML File

Translation XML Files

Annotation XML Files

The Work Sets XML File

Martin Mueller's Tagging Data

Addenda

Software Copyrights and Licenses

The Texts: Provenance, Copyrights and Licenses

Version History

The Mac vs. Windows Wars

NU Deployment Notes