Saturday, September 26, 2015

The AWOL Index

The AWOL Index
This publication systematically describes ancient-world information resources on the world-wide web. The bibliographic data presented herein has been programmatically extracted from the content of AWOL - The Ancient World Online (ISSN 2156-2253) and formatted in accordance with a structured data model. In continuous operation since 2009, AWOL is a blog authored by Charles E. Jones, Tombros Librarian for Classics and Humanities at the Pattee Library, Penn State University.
This publication, The AWOL Index, is an experimental project, developed jointly by Jones and Tom Elliott, the Associate Director for Digital Programs at New York University's Institute for the Study of the Ancient World (ISAW), with the assistance of Pavan Atri, Roger Bagnall, Dawn Gross, Sebastian Heath, Gabriel McKee, Ronak Parpani, David Ratzan, and Kristen Soule.

Creation of The AWOL Index was made possible by a grant from the Gladys Krieble Delmas Foundation.
We extract information from AWOL about both top-level and subordinate resources. Subordinate resources are those we deduce to be "part of" another resource (e.g., a single issue of a journal or a sub-section of a website). Top-level resources are the opposite: those resources described by AWOL for which we have detected no containing/superior resource. 

The latest data extraction was performed on 9 July 2015. At that time, our software successfully extracted 1,301 top-level and 50,704 subordinate resources. For 94% percent of the top-level resources it was able to extract a textual description substantively different from the resource title. Dates of individual source posts in AWOL vary between 2009 and 2015; therefore, content in The AWOL Index will only be as current as the original blog post was on the day of extraction.

Table of Contents

  • Index of top-level resources by title
    A clickable index, sorted alphabetically by resource title, that provides access to HTML versions of the corresponding bibliographic records and thence, via additional links, to:
    • the original AWOL blog posts,
    • the JSON versions of the bibliographic records (our "native format"),
    • derivative records in Zotero, and
    • records for subordinate and related resources.
  • Index of top-level resources by keywords
    Jones assigns categories to blog posts and we capture these as keywords relating to the indexed resources. We also check the contents of resource titles we mine from the blog posts in an effort to identify additional keywords of interest.
  • Directly browseable records
    The native-format JSON records and their HTML derivatives are presented in hierarchical, clickable directory listings, organized by domain of the resource. Individual files are named in accordance with our record naming strategy.
  • JSON for Download
    All JSON files are downloadable as a single ZIP file. Please note our copyright and licensing, citation, and technical guidelines, below.
  • Software
    If you are interested in exploring how the sausage was made or in helping us improve our recipe, all code components used in the creation of The AWOL Index have been released under open source licenses. They are written in the Python programming language, and released in two packages:
    • isaw.awol is used to parse the raw AWOL blog posts to produce the JSON records
    • awol-index is used to convert to JSON records into HTML and to construct the HTML indexes

No comments:

Post a Comment