Tuesday, November 29, 2011

JSTOR Early Journal Content Data Bundle

JSTOR Data For Research

Early Journal Content Data Bundle

About the Early Journal Content (EJC)

The Early Journal Content on JSTOR includes journal articles published in the United States before 1923 and articles published in other countries before 1870, and includes discourse and scholarship in the arts and humanities, economics and politics, and in mathematics and other sciences.
On JSTOR, the Early Journal Content is free and available for use by anyone, without registration and regardless of institutional affiliation. The amount of free content will grow over time. As we add more journals to JSTOR, new articles within these time ranges will be added to the Early Journal Content, and will remain freely available.
Making this early journal content freely available is the most recent step in our ongoing work to expand access to content on JSTOR, particularly for individuals who are not affiliated with academic institutions or libraries.

The EJC Data Bundle

We are happy to also make a data bundle for the Early Journal Content freely available to those who would like to conduct data mining or other research across the content.
The data bundle for EJC includes full-text OCR and article and title-level metadata. The Read Me file explains the data in more detail. The currently available data bundle includes all the EJC as of September 7, 2011.
Please note that use of the Early Journal Content bundle is subject to the Early Journal Content Specific Terms and Conditions of Use.
To access the data bundle, please create an account using the very brief registration form, or login if you already have a Data for Research account. We plan to update the bundle on a semi-regular basis and to alert registrants when the bundle has been updated.
The format of the data bundle is a .tar.gz archive containing a readme file explaining the format of the data files, and an XML file for each article in the Early Journal Content bundle.
Once logged in, you can download the Early Journal Content bundle here
The size of the bundle is approx. 2.3 GB compressed, and 7.2 GB inflated.
The bundle was last updated November 11, 2011.
 And see also: 


No comments:

Post a Comment