DCS, the Digital Corpus of Sanskrit, is a searchable collection of lemmatized Sanskrit texts. It offers free internet access to a part of the database of the linguistic program SanskritTagger, which has been under constant development since 1999.
DCS is designed for research in Sanskrit linguistics and philology. Its interfaces make it possible to search for lexical units and their collocations in a corpus of about 3.050.000 manually tagged words. In addition, DCS generates distributional key values and performs statistical tests that can be used to assess the distribution of lexical units from a chronological or user defined perspective. The digital corpus offers two points of entry for the philological research:
DCS is based on a relational database, whose structure reflects the requirements of philological research. The technical overview of the database design may be helpful for understanding the results obtained using this corpus.
- Lexical units can be retrieved from the dictionary via a query interface or a dictionary page. For each lexical unit contained in the corpus, DCS offers the complete set of references and a statistical evaluation based on historical principles.
- As an alternative, philological research can start from the interlinear lexical analysis that accompanies each text contained in the database. This analysis offers easy access to the dictionary, which, in turn, leads to the philological details about a lexical unit.
As a first step, it is recommended to use either the query or the dictionary page to search for lexical units or to have a look at the text collection. These pages can also be accessed using the main menu at the top of this page. The help center gives a structured and detailed introduction in the functions of the DCS.
The website is optimized for Mozilla Firefox.
Version 1.2., July 2011