9/29/2016 “Modeling a Body of Literature in TEI” @TEI Conference & Members’ Meeting

Posted by on Monday, September 26, 2016 in Uncategorized.

Syriaca.org editor Dr. Nathan P. Gibson will present a poster coauthored with Prof. David A. Michelson, “Modeling a Body of Literature in TEI: The New Handbook of Syriac Literature”, at the TEI Conference & Members’ Meeting in Vienna, Austria on September 29, 2016. The poster and poster text, not including the schema and commentary, is below:

Poster: “Modeling a Body of Literature in TEI” @TEI Conference & Members’ Meeting 2016


The New Handbook of Syriac Literature (NHSL) is a born-digital TEI-encoded reference work for the study of Syriac literature. The first volume, Bibliotheca Hagiographica Syriaca Electronica, was published by Syriaca.org in 2016 using a simple TEI schema to describe a single genre (hagiography).[1] In preparation for expanding the NHSL to include other genres, Syriaca.org is revising this TEI schema. The authors actively seek feedback, suggestions, and criticism concerning this revised schema.


Syriac is a language which once flourished on the Mesopotamian plateau. A dialect of Aramaic, Syriac was widely used during much of the first millennium of the common era and continues to be used today by a world-wide diaspora. There is currently no comprehensive inventory of Syriac literature. Syriaca.org is a research collaborative building digital tools to “re-sort” and “re-orient” the field of Syriac literature.[2] One such tool is NHSL. Each NHSL entry uniquely identifies a “work” (see below) and links to related digital and print resources if possible. The NHSL also seeks to describe works even if they have never been edited or published, by providing titles and excerpts, author information, manuscript attestations, bibliography, and descriptions of language, genre, and/or subject. The NHSL also documents the relationships between works, authors, manuscripts, and the geographic places with which they are associated.

Although there is a long scholarly precedent for using TEI to encode ancient and medieval texts, past practice has focused on describing specific manuscripts (or text-­bearing objects) or creating editions of works. We found that the use of TEI to encode born-­digital metadata about works was less common. Similarly, in the TEI community at large the <bibl> element and its sibling <biblStruct> are often used to represent bibliographic information for specific publications of a work, but rarely for “works” in the abstract or conceptual sense. This poster presents our approach to modeling such a body of literature in TEI.

The Work Entity

The core entity being modeled in the NHSL is a “work”, which is “a distinct intellectual or artistic creation” as defined by FRBR.[3] In NHSL each work is encoded within a <bibl> element comprising the entire <body> of the TEI document. FRBR work entities are abstract concepts distinct from any historical exemplar. Thus Plato’s “Allegory of the Cave” would be a “work” distinct from any extant individual manuscript copies or printed editions of it. The benefit of the FRBR model for NHSL is its capability for grouping related manuscript items, editions, and translations around a conceptual work. For practical reasons, NHSL has simplified the FRBR model into two entities: “works” and “citations”. This two-fold organization is useful because it renders the NHSL easily compatible with de facto descriptive practices (as reflected in RDF vocabularies) of major online catalogues such as worldcat.org, openlibrary.org, dnb.de, and catalog.perseus.org.[4] Such compatibility is necessitated by our goal of linking each “work” to related digital, print, and manuscript citations. Theoretically, our work could be expanded in the future to include the full FRBR taxonomy.

TEI as Metadata Format

Given our project goals, we found a number of advantages in using TEI over other metadata and bibliographic formats such as MARC XML, Dublin Core, MADS, and EAD. In methodological terms, NHSLfollows the documentation norms of the discipline of History, requiring extensive use of footnotes to indicate the provenance of information. Of the above data formats, only TEI’s @source attribute combined with the TEI’s model.biblLike class of elements provided the comprehensive provenance mechanism needed to meet this standard. While other formats do have some ability to indicate sourcing (including @source in EAD), their sourcing mechanisms are not universally available throughout the data model nor able to reproduce a bibliography of citations.  Because TEI permits customization such that @source could be used in a standard way with any element, we are able to record and attribute multiple and even contradictory historical claims to serve our scholarly users. A second benefit to using TEI comes from the relative compatibility of the TEI data model with our project specific needs for RDF serialization. TEI customization provided the simplest method of embedding URIs directly in the data (via <idno>, @ref, <relation>, etc.). While other metadata formats are also capable of serialization into RDF, the extensibility and flexibility of TEI was a particular fit. Third, we found that the generic nature of the TEI offered advantages over more domain-specific data models for metadata. MARC XML, Dublin Core, MADS and EAD are primarily designed with the needs of library or archive use in mind, specifically the creation of standards or “authority” fields. Because in many cases the historical methodology of our project explicitly precludes privileging a uniform title or attributed author for a work above others, we needed the ability to record multiple valid titles and authors for scholarly purposes without having to designate one as authoritative or preferred. Only the TEI allowed such a neutral approach. (We do also recognize the need of catalogues for authority files. By using TEI as a base format, we are able to serialize MADS records with uniform titles for such use. The reverse would not, however, be possible had we begun with MADS. In sum, the granularity of our data in TEI makes crosswalks possible to other, less-granular bibliographic formats.) Finally, TEI’s flexible <note> and <bibl> mechanisms allow us to include a variety of semi-structured textual, descriptive, and bibliographic information including excerpts from the texts of the works (“incipits”, “explicits”, “colophons”, etc.).

  1. See also Jeanne-Nicole Mellon Saint-Laurent, “Gateway to the Syriac Saints: A Database Project,” The Journal of Religion, Media and Digital Culture 5, no. 1 (May 3, 2016): 183–204 [https://www.jrmdc.com/journal/article/view/78].
  2. David Allen Michelson, “Mixed Up by Time and Chance? Using Digital Methods to ‘Re-Orient’ the Syriac Religious Literature of Late Antiquity,” The Journal of Religion, Media and Digital Culture 5, no. 1 (May 3, 2016): 136–82 [https://www.jrmdc.com/journal/article/view/80]; “Syriaca.org as a Test Case for Digitally Re-Sorting the Ancient World,” in Ancient Worlds in Digital Culture, ed. Claire Clivaz, Paul Dilley, and David Hamidović (Leiden: Brill, 2016).
  3. IFLA Study Group on the Functional Requirements for Bibliographic Records, “Functional Requirements for Bibliographic Records: Final Report,” International Federation of Library Associations and Institutions, 1998 (revised 26 December 2007), [http://archive.ifla.org/VII/s13/frbr/frbr_current3.htm#3.2].
  4. Nathan P. Gibson, David A. Michelson, and Daniel L. Schwartz, “From Manuscript Catalogues to a Handbook of Syriac Literature: Modeling an Infrastructure for Syriaca.org,” Journal of Data Mining and Digital Humanities (forthcoming), https://arxiv.org/abs/1603.01207.

Poster is copyright Nathan P. Gibson and David A. Michelson, 2016, and licensed for publication under a CC BY 4.0 unported license.

Tags: , , , ,

Comments are closed.