March 17th, 2017 by Gabriel Bodard
The Institute for Classical Studies is pleased to announce the appointment of Simona Stoyanova for one year as a new Research Fellow in Library and Information Science on the Cataloguing Open Access Classics Serials (COACS) project, funded by a development grant from the School of Advanced Study.
COACS will leverage various sites that list or index open access (OA) publications, especially journals and serials, in classics and ancient history, so as to produce a resource that subject libraries may use to automatically catalogue the publications and articles therein. The project is based in the ICS, supervised by the Reader in Digital Classics, Gabriel Bodard, and the Combined Library, with the support of Paul Jackson and Sue Willetts. Other digital librarians and scholars including Richard Gartner and Raphaële Mouren in the Warburg Institute; Patrick Burns, Tom Elliott and Charles Jones from the Institute for the Study of the Ancient World (NYU); and Matteo Romanello from the German Archaeological Institute are providing further advice.
Major stages of work will include:
By the end of the pilot project, we will have: made available and documented the intermediate dataset and harvesting and ingest code; performed a test ingest of the data into the ICS library catalogue; engaged known (NYU, Zenon, BL) and newly discovered colleagues in potentially adding to and using this data; explored the possibility of seeking external funding to take this project further.
- Survey of AWOL: We shall assess the regularity of metadata in the open access journals listed at AWOL (which currently lists 1521 OA periodicals, containing a little over 50,000 articles), and estimate what proportion of these titles expose metadata in standard formats that would enable harvesting in a form amenable to import into library catalogues. A certain amount of iteration and even manual curation of data is likely to be necessary. The intermediate dataset will need to be updated and incremented over time, rather than overwritten entirely on each import.
- Intermediate data format: We will also decide on the intermediate format (containing MARC data), which in addition to being ingested by the Combined Library will be made available for use by other libraries (e.g. NYU Library and the German Archaeological Institute’s Zenon catalogue). The addition of catalogued OA serials and articles to the library catalogue will significantly contribute to the research practice of scholars and other library users, enabling new research outputs from the Institute and enhancing the open access policy of the School.
- Further OA indexes: Once the proof-of-concept is in place, and data is being harvested from AWOL (and tested that they update rather than overwriting or duplicating pre-existing titles), we shall experiment with harvesting similar data from other indexes of OA content, such as DOAJ, OLH, Persée, DialNet, TOCS-IN, and perhaps even institutional repositories.
- Publish open access software: All code for harvesting OA serials and articles, and for ingest by library catalogues will be made available through Github. This code will then be available for updating the intermediate data to take advantage of new titles that are added to AWOL and other resources, and new issues of serials that are already listed. This will enable reuse of our scripts and data by other libraries and similar institutions.
We consider this project to be a pilot for further work, for which we intend to seek external funding once a proof of concept is in place. We hope to be able to build on this first phase of work by: extending the methods to other disciplines, especially those covered by the other institute libraries in SAS; enabling the harvest of full-text from serials whose license permit it, for search and other textual research such as text-mining and natural language processing; disambiguating enhancing the internal and external bibliographical references to enable hyperlinks to primary and secondary sources where available.