Coptic SCRIPTORIUM provides a platform for interdisciplinary and computational research
in texts in the Coptic language, particularly the Sahidic dialect. As an open-source,
open-access initiative, the SCRIPTORIUM technologies and corpus function as a
collaborative environment for digital research by any scholars working in Coptic. It
tools to process Coptic texts
a searchable, richly-annotated corpus of texts using theANNISsearch and
visualizations of Coptic texts
a collaborative platform for scholars to use and contribute to the project
research results generated from the tools and corpus
We hope SCRIPTORIUM will serve as a model for future digital humanities projects
utilizing historical corpora or corpora in languages outside of the Indo-European and
Semitic language families.
The corpora below offer some examples of mark-up for diplomatic transcription and normalization.
Most data is available in TEI XML, PAULA XML and relANNIS for use with the
ANNIS corpus search software. Links are provided to search the corpus online in ANNIS. Individual documents can also be viewed in HTML for reading purposes
in either diplomatic or normalized transcriptions with English translations. [For more information on TEI, PAULA, and ANNIS, check out our FAQ.]
The search and visualization tool ANNIS is the most powerful way to use the texts for research purposes. We've provided some sample queries below to demonstrate some of the kinds of searches you may construct. ANNIS queries use either regular expressions or the ANNIS query language. If you are familiar with ANNIS or regular expressions, jump right in. If not, you may wish to try some of the sample queries and then substitute terms or search parameters to adapt them to your needs and learn the system. After clicking on the magnifying glass, you will be taken to a new page with the ANNIS query and results. The query will appear in the box on the upper left. The corpus/corpora you are searching will be selected on the lower left. And your search results will appear in the panel on the right.
Search for Greek verbs in multiple corpora: pos="V" & lang="Greek" & #1 _=_ #2
Search for focalizing converters in Besa's letters: pos="CFOC"
Look for locational expressions in the Apophthegmata Patrum corpus: entity="place"
Find some mentions of the following terms of kinship in the translation of Abraham our Father: translation=/.*([Mm]other|[Bb]rother|[Ff]ather|[Ss]ister|[Ss]on|[Dd]aughter).*/
Search for lines ending with a letter written in small print in Besa's letters: hi_rend=/.*small.*/ & lb_n & #1 _r_ #2
See how many lines of Abraham Our Father don't come from the manuscript MONB.YA: lb_n & meta::msName!="MONB.YA"
Find words with the morpheme ⲙⲛⲧ- in Besa's letters and Shenoute's Acephalous Text 22: morph="ⲙⲛⲧ"
Find common nouns referring back to proper names in the Apophthegmata Patrum corpus: pos="N" & pos="NPROP" & entity & entity & #3 ->coref[type=/diff|appos/] #4 & #3 _r_ #1 & #4 _r_ #2
Note: This corpus is derived from the Sahidica New Testament, which was released by Warren Wells and made available for free electronic distributionfor academic use only. It is not licensed CC-BY; click here for Sahidica licensing information.
Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the CMCL project.
When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to
the CMCL project.