Tag Archives: TEI

Susanne Haaf presenting the lecture

Musings about a lecture: Deutsches Textarchiv Basisformat: A TEI Basic Format not only for German

Yesterday I had the pleasure to follow a lecture by Susanne Haaf about “Das DTA-Basisformat zur TEI-XML-konformen Annotation historischer Textressourcen” at the Berlin-Brandenburg Academy of Sciences (BBAW) as part of the DH-Kolloqium an der BBAW, a series of DH related lectures organized by Alexander Czmiel, Stefan Dumont, Christian Thomas and Kay-Michael Würzner at the BBAW. This new initiative is with its stimulating content also a welcome “prequel” to the Berlin DH regular’s table that welcomes every month the Berlin DH community to its present location, the picturesque Deckshaus, a boat-café in the centre of Berlin and is being organized by Markus Schnöpf from Digital Humanities Berlin.

In her lecture Susanne Haaf introduced the DTA-Basisformat (DTABf), a basis annotation format for historical texts in corpora and text collections. Because Susanne Haaf has already written an exhaustive German Blogpost about the current state of the DTABf and the website of the Deutsche Textarchiv covers extensive German documentation (header Dokumentation), I will not recap her very informative lecture. As being – like Susanne Haaf – currently a member of the H2020 project PARTHENOS, I will attempt to highlight in a few words why the DTABf is interesting to a wider audience, a point discussed yesterday and reflected in endeavours mentioned in Susanne Haafs blog post to make the DTABf accessible to a wider community in English (a short introduction to the DTABf in English can be also found here).

 

Susanne Haaf presenting the lecture

Screenshot of tweet by ifDHb: Susanne Haaf presenting her lecture, Source: https://twitter.com/ifDHberlin/status/903652127507132416

 

The TEI has become more or less a de facto standard for the representation of text in digital form. However, the TEI is not very prescriptive. In a nutshell one might say that the TEI-Guidelines make a variety of offers to the community how to encode individual phenomena found in the sources, but the users are free to choose how to actually encode them (meaning which elements and attributes from the TEI reflect their needs the best). This means that even though the TEI regulates encoding with its extensive tagset, there are often different markup-options for similar phenomena that cause a problem for interoperability. The DTA aimed to solve this problem for its corpus by reducing the TEI tagset and defining the attributes that can be used in order to resolve ambiguities and to enhance interoperability (e.g. enable comparison, visualization).

With its focus on interoperability, the value of the DTABf, although having been developed for the DTA, a digital archive mainly of historical German texts, transcends German historical texts and the DTA. This is proven already by its use by external projects, not all of them with the aim of a final integration into the DTA, and the point that although having been developed initially in the context of the DTA it by now recommended by the DFG (the German Research Foundation) and CLARIN-D as annotation and exchange format for editions and historical corpora. Therefore an English documentation, more good practice examples, including a more detailed statement for which kinds of editorial aims it is a good choice (e.g. text oriented edition vs. document layout oriented edition of historical texts as discussed yesterday), would in my opinion greatly contribute to the international take up of the DTABf as addition to customizations already provided by the TEI.

Last but not least it is worth mentioning that the DTABf contains “subsets” for historical prints (1600-1900) and manuscripts (DTABfM). Although not explicitly stated in the guidelines, at least I was not able to find it, the DTABfM is not targeted at medieval manuscripts, but early modern manuscripts. As this information is based mainly on discussions with DTA practitioners, it would be interesting to delve deeper into the question if it works also for medieval manuscripts at least for basic encoding (maybe one of my next posts?). Experiences, comments, ideas?

Nachtrag (8.9.2017): Ein deutscher Nachbericht und die Folien des Vortrags von Susanne Haaf sind jetzt online.

 

Advertisements

Leipzig und DH: Impressionen

Summary: A lot has changed in the Humanities since I had my first academic job in the context of an edition project at Leipzig University. The ongoing digital transformation of all humanities disciplines asks for more self-reflection on methodologies and early as well as life long training. Leipzig is with the European Summer University in Digital Humanities and other important DH activities and actors a DH hot spot and therefore was a very fitting place for a presentation of the PARTHENOS Training Suite. 

Von 2003-2005 hatte ich meine erste Stelle als Wissenschaftliche Mitarbeiterin an der Niederlandistik der Universität Leipzig und bearbeitete dort zwei Editionen mittelniederländischer Texte. Diesen Juli hatte ich endlich wieder die Gelegenheit nach Leipzig zurückzukehren. Das letzte Mal, die DHd-Konferenz in Leipzig (2016), die ich noch immer in sehr guter Erinnerung habe – auch wegen meines ersten Besuchs im legendären Faustischen Auerbachs Keller (!), aber das nur am Rande – war inzwischen schon wieder eine Weile her. Der Anlass war die Einladung im Rahmen der European Summer University in Digital Humanities 2017 die PARTHENOS Training Suite zu präsentieren. Das GWZ in der Beethovenstraße, mein alter Dienstort, gibt es zwar noch immer, aber vieles hat sich verändert, an der Uni Leipzig und in den Editionswissenschaften. Genau der richtige Hintergrund für eine persönliche Reflektion.

IMG_20170719_124518

ESU 2017 Poster

Im Jahr 2003 steckten die digitalen Editionswissenschaften und vor allem die TEI (Text Encoding Initiative) noch in den Kinderschuhen, bzw. waren noch weit entfernt von dem enormen methodologischen Einfluss, den sie danach nehmen sollten (Link: Geschichte der TEI). Es stand natürlich auch damals außer Frage, dass die mittelniederländischen Editionen digital gemacht werden sollten. Aber digital bedeutete im Rahmen des Projekts mit Hilfe eines Textverarbeitungsprogramms, nicht mit XML-Editoren, oder dem damals noch stark verbreiteten, aber für die meisten Anwendungsfälle viel zu komplexen TUSTEP.

Die Editionen der beiden mittelniederländischen Texte sind schon lange erschienen. Es ist mir nicht bekannt, ob die Textdateien noch existieren, aber selbst wenn, das Endprodukt war eben keine digitale oder Hybrid-Edition, sondern eine Druckausgabe. Vier Gedanken:

  • Es zählte bei den Textdateien nur das “Aussehen” der Druckfassung und keine standardisierte Auszeichnung, wie es die TEI möglich macht, damit diese Texte in anderen Zusammenhängen, in Portalen oder mit Hilfe von Tools nachnutzbar gemacht werden können, ganz abgesehen vom Online-Zugriff.
  • Auf der anderen Seite hatte der Editor ein leichtes Spiel und konnte fachwissenschaftliche und technische Workflows gut in seiner Person vereinen. Wenn man ansatzweise die “Tücken” seines Textverarbeitungsprogramms kannte, war die Lernkurve relativ gering.
  • Heute gibt es eine viel stärkere Ausdifferenzierung der Rollen, nicht zuletzt einer der Gründe, warum digitale Editionen viel mehr in Teams als durch Einzelpersonen erstellt werden.
  • Die Versionierung war ein Graus, vor allem wenn zwischendurch Andere Kontroll- oder Teilaufgaben übernahmen, da alles in einem Dokument und (lange Zeit vor Cloud-basierten Kollaborationswerkzeugen) offline passierte.

Es ist inzwischen fast Gemeingut, dass der verstärkte wissenschaftliche Einsatz digitaler Tools, Methoden und Standards wie der TEI etc. in den Geisteswissenschaften nach Zusatzqualifikationen und methodologische Reflektionen verlangen. Natürlich darf dabei das Fachwissen nicht außer Acht gelassen werden. Wenn niemand mehr historische Handschriften lesen kann und das editionswissenschaftliche und fachwissenschaftliche Know-How und Methodenverständnis fehlen, helfen auch die TEI-Richtlinien und XML-Editoren nicht weiter… Deshalb sind maßgeschneiderte Schulungs- und Weiterbildungsangebote, ob im Rahmen universitärer Curricula oder als Workshops, Summer und Winter Schools sowie Online-Angebote für StudentInnen, Wissenschaftliche MitarbeiterInnen bis zu ProfessorInnen etc. ungemein wichtig. Nicht nur, um die Praxis zu lehren, sondern auch um über die Vor- und Nachteile, bzw. Verbesserungspotentiale zu reflektieren. Jedes Produkt lebt letztendlich vom Nutzerfeedback und sein “Marktwert” steigt mit dem Bekanntheits- und Einsatzgrad. Auf Niederländisch gibt hierzu das sehr treffende Sprichwort “Onbekend maakt onbemind” (Unbekannt macht Ungeliebt)… Ich habe beispielsweise noch während meines Masters Editionswissenschaft im Bereich Digitale Editionen “nur” TUSTEP und InDesign gelehrt bekommen, die Mächtigkeit der TEI ist mir erst später bewusst geworden, als ich mich verstärkt für digitale Editionswissenschaft zu interessieren began. Thanks to DH Oxford!

Die European Summer University (ESU) in Digital Humanities unter der Leitung von Prof. Elisabeth Burr ist ein sehr gelungenes Beispiel für ein Format, dass zum einen die Potentiale digitaler Forschung aufzeigt und zum anderem Hands On die benötigten Kompetenzen vermittelt und kritisch die Methoden befragt. Besonders spannend an der ESU ist die breite “Streuung” des Publikums, sowohl geographisch als auch soziologisch (in dem Sinn, dass tatsächlich unter den TeilnehmerInnen eine Spannbreite von StudentenInnen bis ProfessorInnen zu finden ist). Ein passender Ort somit auch die durch das H2020-Projekt PARTHENOS entwickelten Trainings- und Schulungsmaterialien und -formate, die PARTHENOS-Training Suite, im Rahmen einer Projektpräsentation vorzustellen. Noch einmal herzlichen Dank für die Einladung und die perfekte Organisation!

IMG_20170719_124422

Auslage an der ESU Registrierung

Die ESU und der Lehrstuhl von Prof. Burr sind jedoch nicht der einzige DH Hot Spot in Leipzig. Besonders zu nennen ist natürlich der Humboldt Chair of Digital Humanities mit dem Lehrstuhlinhaber Prof. Gegory Crane und seinem Team, aber auch die DH Aktivitäten der Universitätsbibliothek Leipzig. Zu letzterem könnte man fast sagen, dass DH + Bibliotheken ein “Match in Heaven” sind. Bibliotheken haben meist genau die Bestände, mit denen man DH “machen” kann. Aber wie kommt die DH Community an diese Daten? Sie kann ja nicht alles selbst digitalisieren, das wäre nicht nur uneffektiv, sondern ist auch nicht immer möglich. Ein Aufgabe, der sich Bibliotheken daher verstärkt annehmen, ist die Digitalisierung und die Bereitstellung und Archivierung der Daten aus Digitalisierungsprojekten. Besonders begrüßenswert für die Forschung ist es dann, wenn dies im Rahmen einer Open Digitization Stategy, wie an der Universitätsbibliothek Leipzig geschieht, und die Daten zum Beispiel über Digitale Sammlungen präsentiert und in an andere Recherche- und Verarbeitungssysteme weitergeben werden.

IMG_20170719_123516

Universitätsbibliothek Leipzig (Albertina)

Last but not least ist in Leipzig auch einer der beiden Sitze der Deutschen Nationalbibliothek, deren Stategische Prioritäten 2017-2020 stark durch digitale Innovationen gepägt sind.

Wer jetzt Lust bekommen hat nach Leipzig zu fahren. Neben DH sind nicht zuletzt der eindrucksvolle Kopfbahnhof und die atmosphärische Innenstadt eine Reise wert. Wenn die Reise noch etwas Aufschub erfordert, sei ein virtueller Besuch der DNB, genauer gesagt der Austellung des Deutschen Buch- und Schriftmuseums Zeichen – Bücher – Netze (2014) empfohlen. Leipzig lohnt sich!

IMG_20170719_215416

Leipzig Hauptbahnhof

DHBenelux2014: Boosting Digital Humanities in the Benelux states

DHBenelux 2014, June 12-13, 2014, The Hague, Netherlands

Hosted by the Royal Library of the Netherlands (Koninklijke Bibliotheek) and Huygens Institute for the History of the Netherlands  

The so called Benelux (Belgium, Netherlands, Luxemburg) share a long common history and are acting in modern Europe often together, as three small countries acting united can achieve more than three small countries acting alone. This was definitely true for the first joint DHBenelux conference this June in The Hague.

The amount and diversity of attendees was astonishing, given that this was the first conference of its kind (have a look at the DHBenelux-Website for the program and attendees)! Amongst the speakers were not only some of the most prominent DHBenelux-researchers, but the organizers were also able to draw as keynote Melissa Terras (Director of UCL Centre for Digital Humanities and Professor of Digital Humanities at University College London (UCL)).

I departed at 5 in the morning by train from Göttingen and was really lucky that the serious thunderstorms the days before had not blocked my travel route completely. Because I first had to go to my hotel (I did stay at a very tiny, but very nice hotel called Statenhotel), I missed a few minutes of the very inspiring keynote by Melissa Terras (and lucky me, I was even able to discuss a little bit with her during the coffee break…).

Thursday afternoon and Friday morning were filled by sessions of three researchers each, a live demonstration of some projects (Chordify was definitely very popular, but they brought a real guitar), the conference diner, and poster presentations. I will not start mentioning all the names here and also not go into details concerning the content of the presentation, as some very good reviews have already been published by Marisa Martinez (http://dixit.hypotheses.org/348), Heidi Elaine Dowding (http://dixit.hypotheses.org/tag/dh-benelux), and Max Kemman (http://www.maxkemman.nl/2014/06/grasping-technology/) and many researchers were tweeting live during the conference using #DHBenelux or @DHBenelux. Wout Dillan even made a textual analysis of the tweets with Voyant tools. More blogposts about the conference might be collected here.

Wout Dillan's textual analysis of DHBenelux twitter feed

Wout Dillan’s textual analysis of DHBenelux twitter feed

Truly funny was the experience that although all speeches were given in English, the breaks were full of Dutch and only occasionally some French, German or English. However it might have felt strange for Dutch natives (and all the other non-English speakers as well) to communicate during the official part of the conference in English, the non-Dutch speakers truly appreciated this effort because it was opening up the discussion for all. So I truly hope that the language policy will not change next year. I guess we have to accept that English has become the lingua franca in international research communities (and probably most researchers nowadays prefer English to Latin, which was the academic lingua franca up to the 19th century, just saying that…).

KB Aula wall poem

KB Aula wall poem

The organizing team of DHBenelux 2014 (Steven Claeyssens, Karina van Dalen-Oskam, Mike Kestemont and Marijn Koolen has done a great job by putting together a varied program of speeches, posters and live presentations. The organizing heart of the conference was Suzanne van Kaam, who did a great job, never loosing track of any tiny detail. I am very thankful for having my speech about coding the Vierde Partie by Lodewijk van Velthem in TEI accepted (abstract: http://dhbenelux.org/wp-content/uploads/2014/06/unstable-data-wuttke.pdf, the slides are available on Slideshare: http://de.slideshare.net/DHBenelux/towards-a-digital-edition-of-the-vierde-partie-of-the-speigel-historiael). I definitely learned a lot! Thank you all for the great hospitality!

Given this overall success all participants were happy to hear next year’s conference announced to be held June 11-12 2015 at Antwerp. I am convinced these kind of activities can truly boost DH research in the Benelux-states as they not only give a platform for DH Benelux researcher to meet and share ideas, but it is also boosting DH activities in the Benelux on an international level.

See you in Antwerp!

 

Torn between Creating Data and Saving Data

As the deadline for DHBenelux 2014 is approaching, I start panicking slightly as my paper is still not ready. Quite a lot of my time is absorbed by my new job as a more formal kick-off of the project Humanities Data Centre is scheduled for next week. With many partners involved (you can read the official press-release of the GWDG here, sorry only in German), you can imagine it takes a lot of communication and getting to know each other. I also try to get some ideas about my new town and found already a nice yoga studio and a little choir. And I have to admit that I really need some stretching and singing with my otherwise quite sedative activities, so I have to take into account to make time for this as well.

So while I am thinking of Saving Data, I am also thinking of how to Create Data, in my case, a digital edition of the German Vierde Partie of the Spiegel Historiael by Lodewijk van Velthem. And the more I think about it while actually reading Patrick Sahle’s dissertation on Digital Editions (this enormously inspiring work is published Open Access here, again only in German) the more I start doubting. Now is doubt in academia not generally a bad thing, but the start of good research question, but at the moment I feel still miles away from translating the implications of his approach to the actual edition process. Good news it, almost two more weeks to go until the conference… So some time left to make up strategies to resolve this riddle!

If you can read German and are interested in digital editions, do get Patrick Sahle’s books! No easy read, but worth their time!

https://i0.wp.com/www.i-d-e.de/wordpress/wp-content/uploads/2013/02/cover_Bd_9-klein.png

So many forms of texts… Cover of the third part of Patrick Sahle, Digitale Editionsformen, Schriften des Instituts für Dokumentologie und Editorik, 9, Norderstedt, 2013. Image location is http://www.i-d-e.de/wordpress/wp-content/uploads/2013/02/cover_Bd_9-klein.png.

 

Impressions from WebMontag Göttingen / Einstein-Zirkel Berlin

Just in time before the weekend some impressions from WebMontag Göttingen and the Berlin Einstein-Zirkel Workshop.

Once a month a group of Digital Humanities enthousiasts gathers in the Göttingen Centre for Digital Humanities (GCDH) for Webmontag or Webmonday. These  informal meetings have the aim to offer a meeting point for people interested in issues relating to Web 2.0 and especially the active Göttingen DH scene  (http://www.gcdh.de/en/events/web-monday).
I was invited to join the March meeting which focussed especially on digital editions and next to my Vierde Partie project for which I am staying at the moment at the Berlin State Library (http://staatsbibliothek-berlin.de/) as bursary of the Stiftung Preußischer Kulturbesitz, several running and future Göttingen based DH projects were presented (http://webmontag.de/location/goettingen/2014-01-13).
During the lively discussion after my short presentation which acted as a kind of kick-off, I got some very useful tipps concerning the issue of how to represent in XML-TEI the loads of special characters and abbreviations in medieval manuscripts. The main point of critique was that using characters in the PUA of Unicode might conflict with the aim of general compatibility. I intend to adress this issue in a later blog entry more profoundly as I still have to think how to actually solve this problem and decide on a reasonable compromis.
The running projects presented at this WebMontag were the notebooks of Theodor Fontane (http://www.uni-goettingen.de/de/303691.html) and Blumenbach online (http://www.blumenbach-online.de/). It was really impressive to get an insight into the progress of these projects. One of the future Göttingen projects will be a Mayan dictionary, which will have to deal with the problem of how to code the highly complicated graphical Mayan hieroglyphs that are transmitted mainly as inscriptions on building and on archeological objects like vases etc. (as during the conquest of South America most manuscripts were destroyed). Some information can be found on the website of the project (http://www.iae.uni-bonn.de/forschung/forschungsprojekte/laufende-projekte/idiom-dictionary-of-classic-mayan/interdisciplinary-dictionary-of-classic-mayan-idiom). The other project is an editionproject on the Geschichte der Neologie, which aims at digitally editing important works in this field and showing the development of specific thoughts in these works during a longer period.
As the focus of this evening on digital editions was concidered very succesful by its participants, the idea is to have in future more often digital editions as special theme of the Göttingen Webmontag. This would offer a possibility to discuss problematic issues that emerged during the discussion as the citability of digital editions and if a only machine-readable edition is still a edition in more detail.

During the general discussion Jörg Wettlaufer also gave a short account of the third workshop of the Einstein-Zirkel Digital Humanities Berlin (http://www.digital-humanities-berlin.de/werwirsindhttp://www.digital-humanities-berlin.de/archive/1064) which took place on 28th of February at the Freie Universität Berlin. This workshop was at least concerning the numbers of participants a huge success. I was very lucky to be at this moment in Berlin and therefore able to attend to the meeting as well and talks to many of the poster presentators and participants. A setback was the rather undecided vision of some big DH players in Berlin during the plenary discussion to act together to strengthen Berlin’s DH infrastructure. You can read Jörg Wettlaufer’s more detailed account of this event on his blog (http://digihum.de/2014/03/digital-humanities-in-berlin-grenzen-ueberschreiten-28-2-14/) and the book of abstracts (more than 60! projects) can be downloaded as PDF (https://edoc.hu-berlin.de/docviews/abstract.php?lang=ger&id=40508).

Impressions from ‘Historical Documents Digital Approaches’, 5-7 September 2013

Better late than never some impressions from the HDDA 2013 workshop. This workshop took place at Ghent University begin September and did not only include fantastic speakers (hdda_leaflet), but was also very well organised, including fantastic sandwiches for lunch and splendid sunshine (though the organisers might not be held responsible for the latter). It was only a bit unfortunate for a DH workshop that Wifi was not working in the lecture room, but again I am sure that the organisers didn’t have a hand in this and I can only speculate that the planners of the UFO (the modern housing of the history department and parts of UGent administration) meant it to be like that to prevent people from checking their mails in the lecture rooms.

Unfortunately I missed the first sessions due to some private coincidences, the rest of the morning lectures of these three days spanned from Bert van Raemdonck (Ghent) who lectured on editing letters in TEI to Caroline Macé (Leuven) who lectured on how to use digital tools to analyse and visualise the history of texts (stemmata).
A very relevant point was that TEI is only one of the available codes, but also the most widely used code, so if it fits your needs USE it. This will make your results shareable (and please do share your code !), easily mineable, and it offers also the advantage that more TEI-advanced scholars are mostly very willing to lend newbies a hand (for this one can for example join the TEI-list).
Another important point was that the scholars who use digital documents and tools have to be aware what they are doing and which implications it brings for your scientific work. Worst case scenarios mentioned here included scholars who measured properties of medieval manuscripts using digital facsimiles and not taking into account that measuring a picture will maybe not lead you to correct measurements. Very fascinating were also the lectures on computational topic recognition and computational authorship attribution. I have to admit that this sounded in the beginning like magic to me, but after a while hearing and reading more about the methods and tools I start understanding the underlying logic.

The afternoons were reserved to a hands-on training in TEI-conformant XML with David Birnbaum (Pittsburgh) using Oxygen. He focussed in contrast to earlier trainings in which I have participated more on actually encoding the body of the text than the TEI-header so that in the end all participants had some ideas how to actually practically encode a historical document in TEI, including abbreviations and variants in the transmission history. And yes, we sighed…because if you want encode all this, your TEI-code starts looking very unattractive, that will say, so chaotic that you almost don’t see anymore what you are doing. That is why the advice to first think very good about what you actually need, who your audience is, what kinds of uses you want to enable, is a very good advice indeed.

To sum up, to DO Digital Humanities means in many cases learning to handle code and tools that are rather unfamiliar to the traditional humanities scholar. It is a barrier one has to take and not being afraid to ask and make mistakes probably is an essential part of the process. To remind us in future of the fact that editing with TEI is a lot of work (so rather start sooner than later) David Birnbaum gave everybody a “Shut up and Code” button. I don’t regret having spent my birthday coding…it was fun! Thanks to the organisers (especially Tjamke Snijders and Els De Paermentier, UGent) and sponsors and I really hope that next year a follow up will take place. Why not have then hands-on experience with XSLT or Mallet?

Button

Upcoming: DigHum13 Summer School 2013

I just submitted my first ever poster abstract for the DH Summer School that will take place in Leuven (Belgium) from 18 to 20 September 2013! Pretty exciting, I must say. I am really curious how the jury will like my idea about a modular digital edition of the Vierde Partie of the High German Spiegel Historiael. Of course I strongly hope they will accept it so that I will be able to receive feed back and practical advice during the poster presentation from the senior researchers that will be present. But even without a poster, I can not wait to the end of September. The program is so to say mouthwatering…

The Berlin manuscript in its original binding

The Berlin manuscript in its original binding