By Christian Casey, CLIR postdoctoral fellow at the ISAW Library,
which is publishing today a new interactive, map-based visualization of
its collection based on a newly redesigned database of bibliographic
items tagged with Pleiades IDs,
linked open data URIs for ancient places. In addition to the new
visualization, the ISAW Library is also publishing the underlying data
set, which now comprises over 4,000 curated bibliographic records in
ancient studies, as well as the machine-learning algorithm that
programmatically performs the initial assignment of Pleiades IDs to raw
bibliographic records.
Fig.
1: A book in the ISAW Library collection about Philae mapped neatly
inside the temple itself in ISAW’s new map visualization.
Once upon a time (June 2009), researchers at Google created a tool called Fusion Tables.
Fusion Tables represented an almost utopian use scenario for cloud
resources. They allowed anyone with a dataset to upload it into a
database for free, link it to other tables uploaded by other people, and
generate visualizations directly from their data. This service
eliminated millions of person-hours of redundant work. Nearly everyone
with a dataset sophisticated enough to be worth looking at needs to
visualize that data at some point. Fusion Tables made it possible to
generate visualizations at the click of a button.
But what does this have to do with the ISAW Library?
Well, Fusion Tables made it easy for us to generate a map (built on
the Google Maps framework) of all the items received every month using a
single column of longitude and latitude data derived from Pleiades that
we assigned to each book or journal. Because ISAW comprises researchers
who study disparate parts of the ancient world, not everyone needs to
know about every book in our library. What if you just wanted to see
what we held about sites in Egypt or Central Asia or northern Germany?
The ability to generate maps that provided an intuitive, visual grouping
of newly-acquired books provided a simple solution to the problem of
empowering our users to get the information they wanted in the richest
and quickest way possible.
Our initial efforts with the New Titles mapping project were a
success, and in 2019 Gabriel Mckee, ISAW’s Librarian for Collections and
Services, published a paper on the project in the journal Information Technology and Libraries (ITAL).
At a time when libraries are exploring practical methods of integrating
semantic web technology into our systems, we intended this project as
an early example of how libraries can leverage the work done by open
linked data projects like Pleiades. Geospatial search is only one
example of how linked data can enhance the user experience and encourage
resource discovery, and over a dozen institutions, including NYU, are
now participating in a pilot project to explore methods of integrating linked data URIs into their cataloging workflows.
Unfortunately, and as with all good things, Fusion Tables came to an
end, going offline in December of 2020. This meant that, in order to
continue to develop this project, we would need to “roll our own”
database and map interface based on a loose collection of data that had
previously been sufficient for Google’s large-scale, highly-robust data
processing system. Our version does not need to be quite as
sophisticated as Google’s, of course, but it does need to work reliably
in order to be used by the communities interested in geographically
linked bibliographic data, whether that is an ISAW researcher browsing
our new titles or someone looking to reuse our entire data set (now over
4,000 items) for their own project (more on this below). Building such a
system became a journey of discovery, in which this author learned many
useful things about building web applications.
Making a map based on a database seems easy enough on paper, but it
is not a simple operation in practice. Creating such a feature from
scratch requires: a database; a backend app to query data from that
database; and a front-end interface with a map package that displays the
data. All parts must be connected to each other via the internet and be
hosted and served by a third-party hosting service. Large-scale
projects of this sort are generally created by teams of specialists, but
at the ISAW Library, we have only ourselves, and the scale of the New
Titles map project does not warrant hiring an expensive team of
developers to build a full-scale web application. The silver lining is
that this project not only fulfilled its original objective of providing
a graphic, geographical access point and interface to search our books,
but it also generated valuable datasets which will support further
research into the geographic features of books in ancient studies more
widely (or at least the areas in which ISAW collects). Also, and just as
importantly, building it ourselves enables us to dive in and learn more
than we could in any other way.
How does it work?
Good question!
While cataloging new items for our collection, the ISAW Library’s
staff adds geographic subject headings. Although this is a standard part
of library resource cataloging, we take several extra steps: first, we
provide greater specificity than many other libraries would, identifying
specific sites rather than countries or provinces for minor
archaeological sites. Second, we will apply general geographic headings
on titles that other libraries may not treat geographically–for example,
identifying Egypt as the geographic subject of general works on
papyrology. Third, wherever possible, we include open linked data URIs for geographic places from Pleiades and the Getty Thesaurus of Geographic Names (TGN).
Where necessary and appropriate, we will create new Pleiades headings
for as-yet unlisted ancient places or sites to include in our
cataloging.
At the end of the month, NYU’s Division of Libraries department of Data Analysis & Integration sends
us a report of everything added to the collection that month, including
URIs and other bibliographic data. From these reports, we generate HTML
lists and a Zotero library
(another open linked data project). These data also provide a convenient
and straightforward training dataset for classifying each book’s
subject area. That is, using the geographic data, a machine learning
algorithm reliably groups books into general subjects that are useful to
ISAW researchers. We then use the geographic URIs in the bibliographic
report to assign coordinates for the subject of every title for which
geographic representation makes sense. (Currently, this is done through
retrieving coordinates from a separate, curated database of places from
Pleaides, the TGN, and Wikidata– though we hope that a future version of
the map may employ a true linked-data retrieval of coordinates from
those data sources.)
Results
Figure 2: Generating subject labels from ISAW Library New Titles data
The first official version of the new site provides
a map interface that is very similar to the old one based on Fusion
Tables, with the addition of a few novel features.
First, we changed the location markers to a set of highly-accessible
and colorful symbols (note the legend on the left of the map).
Second, coding our own interface allowed us to create a new feature
in which entries in the list are linked to markers in the map and vice
versa. Clicking a location on the map scrolls the list below the map to
the proper entry. Clicking entries in the list zooms the map to that
book’s specific location. This interaction between map and entry list
allows for easier exploration by scholars seeking books related to their
area of expertise.
The interface also groups entries in nearby geographic regions,
allowing any visitor to see at a glance how much new material is
available in their area of study. The regional grouping and expansion
behavior was specifically designed to be as intuitive as easy to use as
possible, while also addressing the risks of information overload when
large quantities of materials pertain to similar geographic areas.
Finally, and perhaps most importantly, the database generated for
this project represents a pioneering step forward—a link between books
and the specifically ancient geographic areas they cover. This dataset,
which now contains over 4,000 entries, can be used to train
machine-learning algorithms to add helpful data about other books. For
instance, we have already used these data to create an algorithm that
assigns subject categories based on geographic information found in book
titles. But there is no reason to use our map: you can make your own,
or do whatever you want with the data and our machine-learning
algorithm, since we are publishing both here on GitHub:
The AWOL Index: The bibliographic data presented herein has been programmatically extracted from the content of AWOL - The Ancient World Online (ISSN 2156-2253) and formatted in accordance with a structured data model.
AWOL is a project of Charles E. Jones, Tombros Librarian for Classics and Humanities at the Pattee Library, Penn State University
AWOL began with a series of entries under the heading AWOL on the Ancient World Bloggers Group Blog. I moved it to its own space here beginning in 2009.
The primary focus of the project is notice and comment on open access material relating to the ancient world, but I will also include other kinds of networked information as it comes available.
The ancient world is conceived here as it is at the Institute for the Study of the Ancient World at New York University, my academic home at the time AWOL was launched. That is, from the Pillars of Hercules to the Pacific, from the beginnings of human habitation to the late antique / early Islamic period.
AWOL is the successor to Abzu, a guide to networked open access data relevant to the study and public presentation of the Ancient Near East and the Ancient Mediterranean world, founded at the Oriental Institute, University of Chicago in 1994. Together they represent the longest sustained effort to map the development of open digital scholarship in any discipline.
No comments:
Post a Comment