Wednesday, March 3, 2021

Open linked data and mapping scholarship in ancient studies

By Christian Casey, CLIR postdoctoral fellow at the ISAW Library, which is publishing today a new interactive, map-based visualization of its collection based on a newly redesigned database of bibliographic items tagged with Pleiades IDs, linked open data URIs for ancient places. In addition to the new visualization, the ISAW Library is also publishing the underlying data set, which now comprises over 4,000 curated bibliographic records in ancient studies, as well as the machine-learning algorithm that programmatically performs the initial assignment of Pleiades IDs to raw bibliographic records.

An image of the new map interface for ISAW New Titles, showing a book mapped to a temple site in Philae, Egypt
Fig. 1: A book in the ISAW Library collection about Philae mapped neatly inside the temple itself in ISAW’s new map visualization.

Once upon a time (June 2009), researchers at Google created a tool called Fusion Tables. Fusion Tables represented an almost utopian use scenario for cloud resources. They allowed anyone with a dataset to upload it into a database for free, link it to other tables uploaded by other people, and generate visualizations directly from their data. This service eliminated millions of person-hours of redundant work. Nearly everyone with a dataset sophisticated enough to be worth looking at needs to visualize that data at some point. Fusion Tables made it possible to generate visualizations at the click of a button.

But what does this have to do with the ISAW Library? 

Well, Fusion Tables made it easy for us to generate a map (built on the Google Maps framework) of all the items received every month using a single column of longitude and latitude data derived from Pleiades that we assigned to each book or journal. Because ISAW comprises researchers who study disparate parts of the ancient world, not everyone needs to know about every book in our library. What if you just wanted to see what we held about sites in Egypt or Central Asia or northern Germany? The ability to generate maps that provided an intuitive, visual grouping of newly-acquired books provided a simple solution to the problem of empowering our users to get the information they wanted in the richest and quickest way possible.

Our initial efforts with the New Titles mapping project were a success, and in 2019 Gabriel Mckee, ISAW’s Librarian for Collections and Services, published a paper on the project in the journal Information Technology and Libraries (ITAL). At a time when libraries are exploring practical methods of integrating semantic web technology into our systems,  we intended this project as an early example of how libraries can leverage the work done by open linked data projects like Pleiades. Geospatial search is only one example of how linked data can enhance the user experience and encourage resource discovery, and over a dozen institutions, including NYU, are now participating in a pilot project to explore methods of integrating linked data URIs into their cataloging workflows. 

In what follows, I will briefly describe the project in more detail and provide links to our new map and a GitHub repository with the underlying data set and some associated code.

New New Titles

Unfortunately, and as with all good things, Fusion Tables came to an end, going offline in December of 2020. This meant that, in order to continue to develop this project, we would need to “roll our own” database and map interface based on a loose collection of data that had previously been sufficient for Google’s large-scale, highly-robust data processing system. Our version does not need to be quite as sophisticated as Google’s, of course, but it does need to work reliably in order to be used by the communities interested in geographically linked bibliographic data, whether that is an ISAW researcher browsing our new titles or someone looking to reuse our entire data set (now over 4,000 items) for their own project (more on this below). Building such a system became a journey of discovery, in which this author learned many useful things about building web applications.

Making a map based on a database seems easy enough on paper, but it is not a simple operation in practice. Creating such a feature from scratch requires: a database; a backend app to query data from that database; and a front-end interface with a map package that displays the data. All parts must be connected to each other via the internet and be hosted and served by a third-party hosting service. Large-scale projects of this sort are generally created by teams of specialists, but at the ISAW Library, we have only ourselves, and the scale of the New Titles map project does not warrant hiring an expensive team of developers to build a full-scale web application. The silver lining is that this project not only fulfilled its original objective of providing a graphic, geographical access point and interface to search our books, but it also generated valuable datasets which will support further research into the geographic features of books in ancient studies more widely (or at least the areas in which ISAW collects). Also, and just as importantly, building it ourselves enables us to dive in and learn more than we could in any other way.

How does it work?

Good question! 

While cataloging new items for our collection, the ISAW Library’s staff adds geographic subject headings. Although this is a standard part of library resource cataloging, we take several extra steps: first, we provide greater specificity than many other libraries would, identifying specific sites rather than countries or provinces for minor archaeological sites. Second, we will apply general geographic headings on titles that other libraries may not treat geographically–for example, identifying Egypt as the geographic subject of general works on papyrology. Third, wherever possible, we include open linked data URIs for geographic places from Pleiades and the Getty Thesaurus of Geographic Names (TGN). Where necessary and appropriate, we will create new Pleiades headings for as-yet unlisted ancient places or sites to include in our cataloging.

At the end of the month, NYU’s Division of Libraries department of Data Analysis & Integration sends us a report of everything added to the collection that month, including URIs and other bibliographic data. From these reports, we generate HTML lists and a Zotero library (another open linked data project). These data also provide a convenient and straightforward training dataset for classifying each book’s subject area. That is, using the geographic data, a machine learning algorithm reliably groups books into general subjects that are useful to ISAW researchers. We then use the geographic URIs in the bibliographic report to assign coordinates for the subject of every title for which geographic representation makes sense. (Currently, this is done through retrieving coordinates from a separate, curated database of places from Pleaides, the TGN, and Wikidata– though we hope that a future version of the map may employ a true linked-data retrieval of coordinates from those data sources.) 


A visualization of automatically-assigned subject labels generated as a result of learning a mapping from geographic location to subject area using the New Titles dataset.
Figure 2: Generating subject labels from ISAW Library New Titles data

The first official version of the new site provides a map interface that is very similar to the old one based on Fusion Tables, with the addition of a few novel features. 

First, we changed the location markers to a set of highly-accessible and colorful symbols (note the legend on the left of the map). 

Second, coding our own interface allowed us to create a new feature in which entries in the list are linked to markers in the map and vice versa. Clicking a location on the map scrolls the list below the map to the proper entry. Clicking entries in the list zooms the map to that book’s specific location. This interaction between map and entry list allows for easier exploration by scholars seeking books related to their area of expertise. 

The interface also groups entries in nearby geographic regions, allowing any visitor to see at a glance how much new material is available in their area of study. The regional grouping and expansion behavior was specifically designed to be as intuitive as easy to use as possible, while also addressing the risks of information overload when large quantities of materials pertain to similar geographic areas. 

Finally, and perhaps most importantly, the database generated for this project represents a pioneering step forward—a link between books and the specifically ancient geographic areas they cover. This dataset, which now contains over 4,000 entries, can be used to train machine-learning algorithms to add helpful data about other books. For instance, we have already used these data to create an algorithm that assigns subject categories based on geographic information found in book titles. But there is no reason to use our map: you can make your own, or do whatever you want with the data and our machine-learning algorithm, since we are publishing both here on GitHub:

[Original post:]


No comments:

Post a Comment