The AWOL Index
The AWOL Index
This publication systematically describes ancient-world information resources on the
world-wide web. The bibliographic data presented herein has been programmatically
extracted from the content of AWOL - The Ancient World Online (ISSN 2156-2253) and formatted in
accordance with a structured data model. In continuous operation since 2009, AWOL is a blog authored by Charles E. Jones, Tombros Librarian for
Classics and Humanities at the Pattee Library, Penn State University.
This publication, The AWOL Index, is an experimental project,
developed jointly by Jones and Tom Elliott, the
Associate Director for Digital Programs at New York University's Institute for the Study of the Ancient World (ISAW),
with the assistance of Pavan Atri, Roger Bagnall, Dawn Gross, Sebastian Heath, Gabriel McKee, Ronak Parpani, David Ratzan, and Kristen Soule.
Creation of The AWOL Index was made possible by a grant from
the Gladys Krieble Delmas Foundation.
We extract information from AWOL about both top-level and
subordinate resources. Subordinate resources are those we deduce to be "part of" another
resource (e.g., a single issue of a journal or a sub-section of a website). Top-level
resources are the opposite: those resources described by AWOL
for which we have detected no containing/superior resource.
The latest data extraction was performed on 9 July 2015. At that time,
our software successfully extracted 1,301 top-level and 50,704 subordinate resources. For
94% percent of the top-level resources it was able to extract a textual description
substantively different from the resource title. Dates of individual source posts in
AWOL vary between 2009 and 2015; therefore, content in
The AWOL Index will only be as current as the original
blog post was on the day of extraction.
Table of Contents
-
Index of top-level resources by
title
A clickable index, sorted alphabetically by resource title, that provides
access to HTML versions of the corresponding
bibliographic records and thence, via additional links, to:
- the original AWOL blog posts,
- the JSON versions of the
bibliographic records (our "native format"),
- derivative records in Zotero, and
- records for subordinate and related resources.
-
Index of top-level resources by
keywords
Jones assigns categories to blog posts and we capture these as keywords
relating to the indexed resources. We also check the contents of resource titles
we mine from the blog posts in an effort to identify additional keywords of
interest.
-
Directly browseable records
The native-format JSON records and their HTML derivatives are presented in hierarchical, clickable
directory listings, organized by domain of the resource. Individual files are
named in accordance with our record naming strategy.
-
JSON for Download
All JSON files are downloadable as a single ZIP file. Please note our copyright and licensing, citation,
and technical guidelines, below.
-
Software
If you are interested in exploring how the sausage was made or in helping
us improve our recipe, all code components used in the creation of The AWOL Index have been released under open source
licenses. They are written in the Python
programming language, and released in two packages:
- isaw.awol is used to parse the raw AWOL blog posts to produce the JSON records
- awol-index is used to convert to JSON records into HTML and to
construct the HTML indexes
No comments:
Post a Comment