4/19/2016 Syriaca.org Developing Linked Open Dataset
During the spring 2016 semester, Syriaca.org received support from Vanderbilt University’s Jean & Alexander Heard Library for the project “Linked Data from the Medieval Middle East.” The project was funded as part of the Library Dean’s Fellows Program (http://library.vanderbilt.edu/about/deans_fellows/df_projects.php). The goal of the project was to enhance the digital data of Syriaca.org by serializing (rendering) it into “Linked Open Data”. This dataset can be used for new research methods and to share Syriaca.org’s data with research partners. The results of this project will be integrated into the main Syriaca.org site in the fall of 2016; a draft version is already available for testing.
The basic building block of Linked Data is RDF or “Resource Description Framework.” RDF is a “standard model for data interchange” over the internet and can be used to encode and connect digital data across multiple online databases (https://www.w3.org/RDF/). What makes RDF useful is that it is a semantic language, one that has meaning for both computers and humans readers of the data. This semantic aspect of Linked Data was Sir Tim Berners-Lee’s original vision for the World Wide Web, one in which data from different sites is connected to encourage and facilitate discovery across datasets (https://www.w3.org/DesignIssues/LinkedData.html). Semantic languages can also be used for reasoning, in this case allowing machines to draw logical conclusions from Syriaca.org’s data.
In short, the project “Linked Data from the Medieval Middle East” helps Syriaca.org to breakdown the walls between databases (the so-called “silo effect”) so that Syriaca.org can connect its own findings to multiple databases. These connections (federation) will allow for the growth of digital knowledge about the past.
The specific task of the project was to use the data of Syriaca.org to create a new linked dataset in RDF. The data in Syriaca.org was originally encoded in TEI/XML. However, that is not what the human user sees when they go to the site; they see the “front end.” These HTML pages are easily readable so that a human can quickly glean information and hyperlinks are supplied so that the user can explore other resources. Underneath the HTML pages of Syriaca.org though is very complex digital data in TEI/XML. TEI/XML is ideally suited to represent complex humanities data, but this complexity also makes it difficult to share the data. In order for us to share our data, we needed to create a simplified version using Linked Open Data. Hence the need to transform into RDF. We used a specific RDF syntax known as “Turtle” (TTL), which is relatively simple to read. In addition, we drew on a number of commonly used semantic ontologies (dcterms, rdf, rdfs, lawd, geo, skos), meaning that the resulting Linked Open Data could be queried by machines and partner projects.
One of the desired outcomes of this project is the sharing of Syriaca.org data, and during the fellowship we worked toward partnerships with other scholarly websites. The first of these was with Pelagios (http://commons.pelagios.org/) and its partner, Pleiades (http://pleiades.stoa.org/). Pelagios is a collaborative effort of over thirty scholarly websites who hold geographic data related to ancient art, archaeology, history, and literature. Participants are able to connect their data geographically using URIs from the Pleiades gazetteer. Syriaca.org data will now appear as part of this linked data set.
Another significant accomplishment of the project was successfully hosting this linked data on a Vanderbilt Library server (http://dev-rdf.library.vanderbilt.edu/#/databases/syriaca). For this we had to set up a new database, an instance of the Stardog platform (http://stardog.com/). Stardog is a tool for exploring large RDF datasets to find relationships in data that may not be visible to the human eye (using SPARQL queries). Because our Stardog database is available through an API on the web, our collaborators can now explore or download the Syriaca.org dataset for their own research and in whatever ways best fit their needs. Moreover, because this same data is linked with Pelagios, scholars searching through Pelagios also be able to discover the Syriaca.org data.
Another project goal was to visualize the data in a more intuitive way to enhance research. Building on the Stardog SPARQL endpoint, we also created visualization module (using d3.js). Using the new visualization tool at Syriaca.org, scholars can see in a dynamic way the relationships within the data that they may not have otherwise been able to find (http://wwwb.library.vanderbilt.edu/exist/apps/srophe/modules/d3sparql/index.html). For example the following visual graph depicts related places:
To sum up, the project “Linked Data from the Medieval Middle East” significantly enhanced the data of Syriaca.org by creating new datasets and tools using Linked Open Data. We created a dataset in RDF, set up a Stardog database to hold and query it, shared our data with partner projects, and even created ways to visualize the data. This project made scholarly information freely available and greatly enhanced the ability of scholars to make new connections between historical datasets.
This project was only possible because of a diverse team of academic collaborators with a variety of talents and skills. The team would like to thank Dean Joseph Combs and the Vanderbilt Jean & Alexander Heard Library for their financial and logistical support. In addition, Ranier Simon, Elton Barker, and Leif Isaksen of the Pelagios Project provided helpful input.
Alex Ayris (Library Dean’s Fellow, Graduate Department of Religion)
Dr. David Michelson (Project Mentor, Graduate Department of Religion)
Winona Salesky (Lead Programmer, Independent Consultant)
Dr. Cliff Anderson (Project Mentor, Jean & Alexander Heard Library)
Suellen Stringer-Hye (Project Mentor, Jean & Alexander Heard Library)
Chris Benda (Project Mentor, Jean & Alexander Heard Library)
This report was prepared by Alex Ayris with the assistance from David Michelson.