SEDRA: The Syriac Electronic Data Research Archive

Thursday, March 9, 2023

SEDRA: The Syriac Electronic Data Research Archive

[First posted in AWOL 1 August 2016, updated 9 March 2023]

About SEDRA
The Syriac Electronic Data Research Archive (SEDRA) is a linguistic and literary database of the Syriac language and literature. Its acronym derives from Syriac word ܣܕܪܐ sedrā whose meanings include 'array', 'series' as well as 'order' and 'rank', all of which are terms that are associated with database theory.

Project History
SEDRA was established in 1988 by Alaph Beth Computer Systems, a one-person firm founded by George A. Kiraz and based in Los Angeles, that developed, inter alia, Syriac fonts. An early brochure of the company stated that SEDRA "will come on floppy disks in ASCII format."

Kiraz wanted to "crowd source"—analog style—the creation of a linguistic database of the Syriac language. Kiraz sent out letters to his clients who used his Syriac fonts with the word processor Multi-Lingual Scholar and asked them to volunteer to type the lexica of Margoliouth, Payne Smith and Brockelmann with specific ASCII tagging. A new entry began with a caret (^) and English glosses of Margoliouth's dictionary were delimited by a percent sign (%). A letter dated March 22, 1990 reports the status of the project and promises to look into the possibilities of using SEDRA "in Artificial Intelligence applications, especially Natural Language Processing." In addition, Kiraz signed an agreement on March 2, 1988 with the Ancient Biblical Manuscript Center for Preservation and Research and obtained a permission to use the Peshitta New Testament Electronic Database, originally developed by The Way International.

SEDRA went through three incarnations. SEDRA I (1989) derived its data from the database provided by the Ancient Biblical Manuscript Center which provided the data as a flat file database. The data was converted to db_VISTA, a database management system that provided a programmable interface in the C programming language for writing database applications.

SEDRA II (1990) contained additional tables and fields necessary for the generation of Kiraz's Concordance to the Syriac New Testament (1993). Moreover, the entire text of the New Testament was vocalized and pointed, punctuation and accent marks were added, and the text was normalized to represent the BFBS edition of the Syriac New Testament as the text used by The Way was based on other manuscripts, primarily from the British Library. To accomplish the vocalization and pointing process, a program was written that skipped over words which had been vocalized before. Hence, the word ܒܝܬܐ 'house,' which appears 201 times in the corpus, is vocalized only once as ܒ݁ܰܝܬ݁ܳܐ. Initial bgdkpt letters were always marked with a quššāyā point; an algorithm was written to convert the quššāyā into rukkākhā if the preceding word, if any, ended in a vowel and was not followed by a punctuation mark. The dot on the feminine object pronominal suffix ܗ̇ was not included in the pointing, and was added later on by another algorithm based on morphological data.

The next incarnation of the project was SEDRA III (1991). The first change was the move from a relational database model to a network model where ordered, one-to-many parent-child relations simplified the process of concordance generation. In this model, a parent record would have a pointer to the first child record in another table. That child record would have a pointer to the next child, and so on. As laptops at the time had small hard drives of about 10 or 20 MB, SEDRA III converted its fields into bit fields. For instance, two bits were sufficient to indicate person (00 for 1st person, 01 for 2nd, 10 for 3rd). SEDRA III contained 2,050 roots, 3,559 lexemes, 31,079 word forms and 6,337 English meanings (particular to the context of the New Testament). It was published in 1993 on the web site of the University of Cambridge, and later on Beth Mardutho's site hosted by The Catholic University of America's Semitics Department, as a non-commercial open source database. A number of developers downloaded SEDRA III and used it for lexical and concordance applications. One such developer was James W. Bennett who used SEDRA III underlying the BFBS Peshitta in his online Syriac Library Browser and General Syriac Tools.

In February 2013, George Kiraz and James Bennett teamed up to develop SEDRA IV (this web site). Starting from SEDRA III, the database was converted back into a relational database, the binary fields where expanded (now person is a numeric fields with textual references), and additional tables and fields were added. The lexemes table was expanded to include all the words in the Brock-Kiraz dictionary (ca. 15,000 words). Source data from printed lexica were imported either in image or text format. More importantly, a morphological generation component was added to the system with a grant from the International Balzan Prize Foundation under the direction of Peter Brown (Princeton University) and in collaboration with Syriaca.org.

SEDRA IV was launched in March 2015 at the Fourth Hugoye Symposium on Syriac and the Digital Humanities (Beth Mardutho and Rutgers University) and a crowd sourcing call went out asking scholars to tag images of scanned lexical entries to the lexemes of the database. It is expected that SEDRA IV will expand as a crowd sourced project.

As of today (03/09/2023), SEDRA contains 3284 roots, 32336 lexemes, and 61445 words. The dictionaries component includes data from:

Source Number of entries Already tagged To be tagged % complete Example

Audo 5780 4716 1064 81.59% ܐܐܪ

Bar Bahlul 5846 5846 0 100.00% ܐܐܪ

Brockelmann 9956 3111 6845 31.25% ܐܐܪ

Brockelmann (Subentries) 21196 121 21075 0.57% ܒܝܫܘܬܐ

Brock & Kiraz 13617 13617 0 100.00% ܐܐܪ

Ciancaglini 689 689 0 100.00% ܐܝܙܓܕܐ

Costaz AFSS 5989 5989 0 100.00% ܐܐܪ

Holes 5 5 0 100.00% ܕܝܪܐ

Manna-Saome 547 547 0 100.00% ܐܐܪ

Margoliouth 16081 16081 0 100.00% ܐܐܪ

Margoliouth Supplement 9247 9097 150 98.38% ܐܐܪ

Nöldeke:DE 687 280 407 40.76% ܐܝܟ

Nöldeke:EN 739 309 430 41.81% ܐܝܟ

Payne Smith 28313 6373 21940 22.51% ܐܐܪ

SEDRA 3 6123 6123 0 100.00% ܐܐܪ

Syriaca 1708 1708 0 100.00% ܐܒܝܠܐ

Lexeme Search

Dictionary Search
Word Search
Gloss Search
Paradigms
Library

AWOL - The Ancient World Online

Thursday, March 9, 2023

SEDRA: The Syriac Electronic Data Research Archive

About SEDRA

Project History

No comments:

Post a Comment

Digital Humanities Award Winner 2021

Get new posts by email:

Search This Blog

The AWOL Index

Digital Humanities Award Winner

En l'an 2000

Winner of the AIA Award for Outstanding Work in Digital Archeology 2015

Translate

Pelagios Widget

Blog Archive

FeedBurner FeedCount

Search JURN

Contributors

Followers

Stats

Creative Commons License

Source	Number of entries	Already tagged	To be tagged	% complete	Example
Audo	5780	4716	1064	81.59%	ܐܐܪ
Bar Bahlul	5846	5846	0	100.00%	ܐܐܪ
Brockelmann	9956	3111	6845	31.25%	ܐܐܪ
Brockelmann (Subentries)	21196	121	21075	0.57%	ܒܝܫܘܬܐ
Brock & Kiraz	13617	13617	0	100.00%	ܐܐܪ
Ciancaglini	689	689	0	100.00%	ܐܝܙܓܕܐ
Costaz AFSS	5989	5989	0	100.00%	ܐܐܪ
Holes	5	5	0	100.00%	ܕܝܪܐ
Manna-Saome	547	547	0	100.00%	ܐܐܪ
Margoliouth	16081	16081	0	100.00%	ܐܐܪ
Margoliouth Supplement	9247	9097	150	98.38%	ܐܐܪ
Nöldeke:DE	687	280	407	40.76%	ܐܝܟ
Nöldeke:EN	739	309	430	41.81%	ܐܝܟ
Payne Smith	28313	6373	21940	22.51%	ܐܐܪ
SEDRA 3	6123	6123	0	100.00%	ܐܐܪ
Syriaca	1708	1708	0	100.00%	ܐܒܝܠܐ

AWOL - The Ancient World Online

Thursday, March 9, 2023

SEDRA: The Syriac Electronic Data Research Archive

About SEDRA

Project History

No comments:

Post a Comment

Digital Humanities Award Winner 2021

Get new posts by email:

Search This Blog

The AWOL Index

Digital Humanities Award Winner

En l'an 2000

Winner of the AIA Award for Outstanding Work in Digital Archeology 2015

Translate

Pelagios Widget

Blog Archive

FeedBurner FeedCount

Search JURN

Contributors

Followers

Stats

Subscribe To AWOL

Creative Commons License