Yesterday I had the pleasure to follow a lecture by Susanne Haaf about “Das DTA-Basisformat zur TEI-XML-konformen Annotation historischer Textressourcen” at the Berlin-Brandenburg Academy of Sciences (BBAW) as part of the DH-Kolloqium an der BBAW, a series of DH related lectures organized by Alexander Czmiel, Stefan Dumont, Christian Thomas and Kay-Michael Würzner at the BBAW. This new initiative is with its stimulating content also a welcome “prequel” to the Berlin DH regular’s table that welcomes every month the Berlin DH community to its present location, the picturesque Deckshaus, a boat-café in the centre of Berlin and is being organized by Markus Schnöpf from Digital Humanities Berlin.
In her lecture Susanne Haaf introduced the DTA-Basisformat (DTABf), a basis annotation format for historical texts in corpora and text collections. Because Susanne Haaf has already written an exhaustive German Blogpost about the current state of the DTABf and the website of the Deutsche Textarchiv covers extensive German documentation (header Dokumentation), I will not recap her very informative lecture. As being – like Susanne Haaf – currently a member of the H2020 project PARTHENOS, I will attempt to highlight in a few words why the DTABf is interesting to a wider audience, a point discussed yesterday and reflected in endeavours mentioned in Susanne Haafs blog post to make the DTABf accessible to a wider community in English (a short introduction to the DTABf in English can be also found here).
The TEI has become more or less a de facto standard for the representation of text in digital form. However, the TEI is not very prescriptive. In a nutshell one might say that the TEI-Guidelines make a variety of offers to the community how to encode individual phenomena found in the sources, but the users are free to choose how to actually encode them (meaning which elements and attributes from the TEI reflect their needs the best). This means that even though the TEI regulates encoding with its extensive tagset, there are often different markup-options for similar phenomena that cause a problem for interoperability. The DTA aimed to solve this problem for its corpus by reducing the TEI tagset and defining the attributes that can be used in order to resolve ambiguities and to enhance interoperability (e.g. enable comparison, visualization).
With its focus on interoperability, the value of the DTABf, although having been developed for the DTA, a digital archive mainly of historical German texts, transcends German historical texts and the DTA. This is proven already by its use by external projects, not all of them with the aim of a final integration into the DTA, and the point that although having been developed initially in the context of the DTA it by now recommended by the DFG (the German Research Foundation) and CLARIN-D as annotation and exchange format for editions and historical corpora. Therefore an English documentation, more good practice examples, including a more detailed statement for which kinds of editorial aims it is a good choice (e.g. text oriented edition vs. document layout oriented edition of historical texts as discussed yesterday), would in my opinion greatly contribute to the international take up of the DTABf as addition to customizations already provided by the TEI.
Last but not least it is worth mentioning that the DTABf contains “subsets” for historical prints (1600-1900) and manuscripts (DTABfM). Although not explicitly stated in the guidelines, at least I was not able to find it, the DTABfM is not targeted at medieval manuscripts, but early modern manuscripts. As this information is based mainly on discussions with DTA practitioners, it would be interesting to delve deeper into the question if it works also for medieval manuscripts at least for basic encoding (maybe one of my next posts?). Experiences, comments, ideas?
Nachtrag (8.9.2017): Ein deutscher Nachbericht und die Folien des Vortrags von Susanne Haaf sind jetzt online.