Musings about a lecture: Deutsches Textarchiv Basisformat: A TEI Basic Format not only for German

Yesterday I had the pleasure to follow a lecture by Susanne Haaf about “Das DTA-Basisformat zur TEI-XML-konformen Annotation historischer Textressourcen” at the Berlin-Brandenburg Academy of Sciences (BBAW) as part of the DH-Kolloqium an der BBAW, a series of DH related lectures organized by Alexander Czmiel, Stefan Dumont, Christian Thomas and Kay-Michael Würzner at the BBAW. This new initiative is with its stimulating content also a welcome “prequel” to the Berlin DH regular’s table that welcomes every month the Berlin DH community to its present location, the picturesque Deckshaus, a boat-café in the centre of Berlin and is being organized by Markus Schnöpf from Digital Humanities Berlin.

In her lecture Susanne Haaf introduced the DTA-Basisformat (DTABf), a basis annotation format for historical texts in corpora and text collections. Because Susanne Haaf has already written an exhaustive German Blogpost about the current state of the DTABf and the website of the Deutsche Textarchiv covers extensive German documentation (header Dokumentation), I will not recap her very informative lecture. As being – like Susanne Haaf – currently a member of the H2020 project PARTHENOS, I will attempt to highlight in a few words why the DTABf is interesting to a wider audience, a point discussed yesterday and reflected in endeavours mentioned in Susanne Haafs blog post to make the DTABf accessible to a wider community in English (a short introduction to the DTABf in English can be also found here).

 

Susanne Haaf presenting the lecture

Screenshot of tweet by ifDHb: Susanne Haaf presenting her lecture, Source: https://twitter.com/ifDHberlin/status/903652127507132416

 

The TEI has become more or less a de facto standard for the representation of text in digital form. However, the TEI is not very prescriptive. In a nutshell one might say that the TEI-Guidelines make a variety of offers to the community how to encode individual phenomena found in the sources, but the users are free to choose how to actually encode them (meaning which elements and attributes from the TEI reflect their needs the best). This means that even though the TEI regulates encoding with its extensive tagset, there are often different markup-options for similar phenomena that cause a problem for interoperability. The DTA aimed to solve this problem for its corpus by reducing the TEI tagset and defining the attributes that can be used in order to resolve ambiguities and to enhance interoperability (e.g. enable comparison, visualization).

With its focus on interoperability, the value of the DTABf, although having been developed for the DTA, a digital archive mainly of historical German texts, transcends German historical texts and the DTA. This is proven already by its use by external projects, not all of them with the aim of a final integration into the DTA, and the point that although having been developed initially in the context of the DTA it by now recommended by the DFG (the German Research Foundation) and CLARIN-D as annotation and exchange format for editions and historical corpora. Therefore an English documentation, more good practice examples, including a more detailed statement for which kinds of editorial aims it is a good choice (e.g. text oriented edition vs. document layout oriented edition of historical texts as discussed yesterday), would in my opinion greatly contribute to the international take up of the DTABf as addition to customizations already provided by the TEI.

Last but not least it is worth mentioning that the DTABf contains “subsets” for historical prints (1600-1900) and manuscripts (DTABfM). Although not explicitly stated in the guidelines, at least I was not able to find it, the DTABfM is not targeted at medieval manuscripts, but early modern manuscripts. As this information is based mainly on discussions with DTA practitioners, it would be interesting to delve deeper into the question if it works also for medieval manuscripts at least for basic encoding (maybe one of my next posts?). Experiences, comments, ideas?

Nachtrag (8.9.2017): Ein deutscher Nachbericht und die Folien des Vortrags von Susanne Haaf sind jetzt online.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

DHd-Blog

About my research and some other interesting stuff

LIBREAS. Library Ideas

About my research and some other interesting stuff

wisspub.net

Gemeinschaftsblog zu wissenschaftlicher Kommunikation im Netz.

ViFa Benelux-Blog

Das ViFa Benelux-Blog bietet aktuelle Informationen für die Fachgebiete Niederlandistik, Niederlande-, Belgien- und Luxemburgforschung.

merovingianworld

James Palmer on the Early Middle Ages and Other Things

Book History and Print Culture Network

Interdisciplinary Perspectives from German-Area Scholars (D-A-CH)

patter

research education, academic writing, public engagement, funding, other eccentricities.

schoenberginstitute.wordpress.com/

The Schoenberg Institute for Manuscript Studies at Penn brings manuscript culture, modern technology and people together.

thescribeunbound.wordpress.com/

Along these lines ... by Debs Thorpe

Soffits

Thoughts on Medieval Architecture, by Karl Kinsella

Available Online

Digitisation and Research Data in the UK, Europe and beyond

Medieval manuscripts blog

About my research and some other interesting stuff

The Research Whisperer

Just like the Thesis Whisperer - but with more money

The Thesis Whisperer

Just like the horse whisperer - but with more pages

%d bloggers like this: