Seal of the University of the Pacific

Coptic Scriptorium

Humboldt-Universität zu Berlin

Coptic SCRIPTORIUM (Sahidic Corpus Research: Internet Platform for Interdisciplinary multilayer Methods) is a collaborative, digital project created by Caroline T. Schroeder (University of the Pacific) and Amir Zeldes (Humboldt University).

Coptic SCRIPTORIUM provides a platform for interdisciplinary and computational research in texts in the Coptic language, particularly the Sahidic dialect.  As an open-source, open-access initiative, the SCRIPTORIUM technologies and corpus function as a collaborative environment for digital research by any scholars working in Coptic. It provides:

We hope SCRIPTORIUM will serve as a model for future digital humanities projects utilizing historical corpora or corpora in languages outside of the Indo-European and Semitic language families.

Please read our Frequently Asked Questions for more information on the project, methodologies, and terminology.

We hosted a workshop on digital research and scholarship in Coptic at Humboldt University on May 14, 2013. The program and presentations are available.

A video introduction to the project, including how to use ANNIS, is available. The latest release notes and news about the project are on C. Schroeder's blog.


The corpora below offer some examples of mark-up for diplomatic transcription and normalization. Most data is available in TEI XML, PAULA XML and relANNIS for use with the ANNIS corpus search software. Links are provided to search the corpus online in ANNIS. Individual documents can also be viewed in HTML for reading purposes in either diplomatic or normalized transcriptions with English translations. [For more information on TEI, PAULA, and ANNIS, check out our FAQ.]

All corpus data generated by the SCRIPTORIUM project is licensed under the Creative Commons Attribution 3.0 Unported License unless otherwise indicated.

Creative Commons License

Example Queries

Click the magnifying glass next to these example queries to get started:

Acephalous Work 22 by Shenoute

Abraham Our Father by Shenoute

Letters of Besa

Apophthegmata Patrum

Bible: Gospel of Mark

Note: This corpus is derived from the Sahidica New Testament, which was released by Warren Wells and made available for free electronic distributionfor academic use only. It is not licensed CC-BY; click here for Sahidica licensing information.


Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the CMCL project. When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to the CMCL project.

Part-of-Speech Tagging




Page last updated 26 March 2014