Category Archives: conferences

“DH ist kein Ponyhof”: Erfahrungen vom Super-Experiment #twitter101dh bei der vDHd2021

Zitierempfehlung: Ulrike Wuttke, “DH ist kein Ponyhof”: Erfahrungen vom Super-Experiment #twitter101dh bei der vDHd2021, Blogpost, 12.04.2021, CC BY 4.0, Link: https://ulrikewuttke.wordpress.com/2021/04/10/dh-ist-kein-ponyhof/

Die erste Eventwoche der vDHD2021 ist vorbei und damit auch die vier Datensalons von des “#twitter101dh: Superexperiment zu Twitter, Bibliotheken und COVID-19”. In diesem Blogpost möchte ich ein wenig über unsere Erfahrungen berichten.

In der Einreichung und Ankündigung des Super-Experiments haben wir, d. h. das Organisationsteam (Daniel Brenn, Lisa Kolodzi, Mareike König und ich) von einem “Twitter-Labor” gesprochen, in dem wir verschiedene Experimente rund um Twitterdaten durchführen wollen. Im Mittelpunkt sollte eine ergebnisoffene Auseinandersetzung mit den Möglichkeiten der Analyse von Twitterdaten sowie die ersten Schritte der praktischen Durchführung stehen, ganz im Sinne des Mottos der vDHD2021 “Experimente”.

Wir waren sehr froh, dafür mit Paul Ramisch und Sophie Schneider zwei erfahrene Tool- und Datenbuddies gewonnen zu haben, die uns auf unserem Weg begleiten wollten. Außerdem hatten wir über Twitter noch weitere Teilnehmer*innen gesucht, eine kleine Webseite mit Github-Repo und einen Discord-Kanal für die Kommunikation erstellt und so konnte das Experiment losgehen.

In insgesamt vier Datensalons widmeten wir uns unterschiedlichen Aspekten der Analyse von Twitterdaten, vom Tweet-Scraping und ersten Analysen mit R (1. Datensalon mit Paul Ramisch), über die Entwicklung von Forschungsfragen (2. Datensalon mit Mareike König), bis zur Netzwerkanalyse mit Gephi (3. Datensalon mit Sophie Schneider). Für die ersten Schritte mit R und Gephi haben Paul Ramisch und Sophie Schneider jeweils fantastische Tutorials und Ressourcen zur Verfügung gestellt, die über die Webseite zur Nachnutzung zur Verfügung stehen. Danke!

Wir hatten uns als Fokus die Analyse von Twitterdaten zum Thema COVID-19 und Bibliotheken genommen, um anhand dieses Use Cases das Werkzeug für die selbstständige Analyse von Twitterdaten zu lernen. Was haben wir nun in den vier Datensalons gelernt? Darüber haben wir sehr ausführlich im 4. Datensalon reflektiert. Hier kann ich natürlich nur für mich sprechen, aber denke, dass es einigen der Teilnehmer*innen durchaus ähnlich gegangen ist, wie die Diskussion zeigte.

Zunächst war es eine tolle Erfahrung zusammen mit dieser Gruppe zu experimentieren und unseren beiden Daten- und Toolbuddies Fragen zu praktischen und theoretischen Aspekten zu stellen. Denn gerade das Erlernen des Umgangs mit Tools hat eine hohe Lernkurve und manche kleine Probleme stellen Newbies vor große Herausforderungen. Dabei ging es nicht nur um praktische Hürden, wie z. B. die komische Fehlermeldung in RStudio, die ich erst beheben konnte, nachdem ich eine Weile gegoogelt habe (denn Paul sagte: beim Coden ist Google unser bester Freund), es stellte sich heraus, dass ich auf meinem neuen Computer R noch nicht installiert hatte (phu!). Es ging auch um Metathemen wie Daten- und Tool-Literacy. Was sind Twitterdaten eigentlich, was können sie uns sagen und was nicht, wie interpretiere und überprüfe ich die Ergebnisse? Welche Fragen kann ich mit Hilfe quantitativer Statistik beantworten, welche Rolle spielen qualitative Analysen und wo setzt die Interpretation an? Vor allem zu den Möglichkeiten und Herausforderungen der Netzwerkanalyse mit Gephi hatten wir mit Sophie Schneider eine sehr angeregte Diskussion. Denn ohne tiefgehendes Verständnis der unterliegenden Konzepte der Netzwerkanalyse und der Parameter ist es zwar sehr spannend mit Gephi zu experimentieren, aber das Tool und die Ergebnisse sind eigentlich eine Black Box. Besonders spannend fand ich es auch, dass wir als Daten- und Toolbuddies zwei Studierende gewinnen konnten, was wieder zeigt, dass jede*r ein*e Expert*in sein kann!

Ich habe mich dann für meinen praktischen Teil auf R konzentriert und einige erste Analysen zum Thema des Superexperiments, der Twitterkommunikation von Bibliotheken zu COVID-19, durchgeführt, dazu hoffentlich an anderer Stelle mehr. Ich habe dafür nicht nur die Twitterdaten analysiert, sondern auch versucht, jeden Schritt ausführlich zu dokumentieren. Und das kostet wirklich viel Zeit (hallo Ressourcenplanung für das Datenmanagement)! Schon alleine deswegen sollte meiner Meinung nach eigentlich jeder einmal ein wenig coden und dokumentieren, der*die auch nur ansatzweise mit solchen Themen in Berührung kommt (oder gar Aufwände abschätzen oder absegnen soll), oder jemanden fragen, der*die sich da aus der Praxis auskennt.

Paul Ramisch hatte uns gleich am Anfang seines Tutorials gesagt, dass er mit uns mit  dem Konzept der permanenten Überforderung arbeiten wird, d. h. dass wir erst einmal ein paar Dinge mit R machen werden, die wir vielleicht noch nicht vollständig nachvollziehen können, das käme dann später. Und das war auch für mich eine zentrale Erkenntnis: Alles braucht seine Zeit. Das wir in vier Sitzungen umfassend R und Gephi lernen werden, war doch etwas optimistisch gedacht. Was wir erreicht haben und das ist eigentlich viel wichtiger, ist eine kritische Auseinandersetzung mit diesen beiden Tools anhand der praktischen Anwendung und ein tieferes Verständnis für die mit ihrem sicheren Einsatz verbundene Deep-Learning-Curve. Der wissenschaftlich fundierte Einsatz von Digital Humanities-Methoden und -Tools bedeutet mehr als nur auf irgendwelche Knöpfchen von Tools zu drücken (nicht das ich das jemals behauptet hätte, aber daher auch der ironische Titel dieses Blogposts), sondern erfordert umfassende Daten-, Code- und Tool-Literacy (Digital Literacy) und theoretische Reflexionen, beides Themen, die in letzter Zeit zu Recht im Fokus stehen.

Franz & P, Das Leben ist kein Ponyhof, St. Oberholz, Berlin, Flickr, CC BY-NC-SA 2.0

Noch mehr zu #twitter101dh gibt es im Blogpost von Sophie Schneider “#vdhd2021 – Erste Eventtage”. Wer Twitteranalysen in Python durchführen will, auch hierzu hat Sophie Schneider ein Tutorial geschrieben. Weitere Links zu Tutorials und Ressourcen finden sich auf den Seiten der Datensalons. Einige spannende Gedanken in diesem Zusammenhang sind auch in Markus Krajewskis Artikel “Hilfe für die digitale Hilfswissenschaft: Eine Positionsbestimmung.” in der Zeitschrift für Medien- und Kulturforschung 10: 1 (2019), S. 71–80 zu finden [Link zum PDF].

Über weitere Hinweise zu Twitterdaten-Tutorials zu R, aber auch daraus entstandene Studien freue ich mich über die Kommentare, Twitter oder andere Kanäle!

#Open Hacks for Conference Presentations and Panels (not only) for Digital Humanists

Cite as: Ulrike Wuttke, #Open Hacks for Conference Presentations and Panels (not only) for Digital Humanists, Blogpost, 01.08.2019, CC-BY 4.0. Link: https://ulrikewuttke.wordpress.com/?p=1696, last edited 05.04.2020 (I added some extra advice I received via Twitter, thanks!).

Last months I attended DHd 2019 in Mainz and Frankfurt am Main and DH2019 in Utrecht (Netherlands). These were great conferences: great people, many interesting presentations and discussions, great folks. However, they also inspired me this blogpost, that sums up some hacks on conference presentations and chairing panels, how to prepare, give, and spread them online. These are hacks that work for me and that I hope will be usefull for others.

The following hacks are in a way ‘collected wisdom’. They are tricks and hints that I have read somewhere or were being given and that I internalised. Where I remembered sources, I have quoted them. I also included some great general resources at the end of this post. If you have more hacks, thank you for letting me know, so that I can include them!

For Presenters

Preparation Phase

  • #1 Prepare your talk in advance! Presentations written on the plane or the night before tend to be poor and badly timed.
  • #2 Use a tool like Speechinminutes to calculate the length of your speech! If you go over time you will be stealing time from other presenters which is unfair. A good chair will cut your speech when you have reached the time limit. 
  • #3 Don’t lose time to explain tiny little details how you reached a conclusion. Focus on the main points and arguments and even consider talking less and leaving more time for question and answers. Read the interesting advice “Flip your presentation format” from Pat Thomson.
  • #4 You can point at the end to literature and other resources, even your research data, that underpin your argument for more information. See for practical information on how to share research data on website of HU Berlin and read my blogpost about Open Access to Research Data in the Humanities.  
  • #5 When preparing your slides, remember that slides are meant to be a visual aid! Avoid slides that are too densely packed with text (German “Bleiwüste”). 
  • #6 To enhance readability, use a big font size (24pt especially for the modern smaller presentation systems) and watch the size of screen shots (zoom in for details), nothing is more frustrating than seeing text or screenshots with the presenter commenting ‘probably you can’t see it…”.  
  • #7 If you plan to publish your slides online, include the URL/DOI of the presentation in each slide, or to avoid cluttering of your slides, at least in the first and last slide (some Open Access repositories like Zenodo allow you to reserve a DOI).
  • #8 Get an ORCiD (Open Researcher Contribution Identification) so that your presentation slides can be linked to your scientific record. Increasingly other forms of scientific outputs such as presentations, research data etc. are considered an intrinsic part of a researchers scientific record, therefore they should be published and easily identified (see DORA).
  • #9 To structure your presentation “breadcrumbs” (headers or footers that indicate on the slide in which part of the presentation you are) are useful, especially for longer presentations. 
  • #10 Put your full contact information on the last slide and your Twitter handle on the first, so people can tweet easier about your great presentation.
  • #11 Avoid live demonstrations of databases, website etc. Often conference wifi is rather slow and not working in the moment you need it, and live demonstrations take a lot of time. Pro tip: Use screenshots (in the right size, zoom in for details) or record a screencast (quick and dirty screencasts can be made with QuickTime or Camtasia, often universities etc. have a licence).
  • #12 Practice your presentation at least once! Practice increases your confidence and improves the timing (to repeat it, really, an unprepared talk is not cool).
  • #13 Check your slides for spelling mistakes! Spelling mistakes look rather unprofessional and are rather distracting. Pro tip: As we tend to be our own worst proofreaders, ask someone else to proofread your slides for you. 
  • #14 If you intend to read from a script (that’s okay, not everyone is Cicero ;-), print your text in a larger font (e.g. 14 pt), I have also seen people who read their presentations from tablets. 
  • #15 Pro tip for the bold ;-): Share pre-version of slides for comments from community, e. g. via Twitter using Google Presentations (though there seem to be some problems with Disability Access)

Presentation Phase

  • #1 Hand over your presentation to chair / organising committee in time (and in required format!) 
  • #2 Be in time in your presentation room to get accustomed with the equipment and test your presentation. Pro tip: Always bring your presentation (additionally as PDF), in case it is not on the presentation computer or looks weird. 
  • #3 Bring your own adapter (especially for Mac)! 
  • #4 My personal hack to deal with sliding computers:  
Inserting a pen might prevent your laptop from sliding over to low barriers on speakers desks
(Picture by Ulrike Wuttke CCO)
  • #5 Start your presentation with a short Icebreaker (why you are happy to present, what connects you to the place, conference, etc. and say your name again) to bond with your audience. Also thanking the chair and organisers for introduction, invitation is a kind thing to do!
  • #6 Look up from your presentation from time to time, especially when you read from a script, again, practicing will help you! Also, don’t talk to the slides and with your back to the audience! 
  • #7 A presentation is not a rap! Don’t go too fast to squeeze in a few more words. It’s really terrifying for the audience! 
  • #8 Leave enough time for the audience to view your slides! Do not skip through loads of slides, this is really frustrating for your audience. If you know you don’t have time to view all slides, exclude them in advance. 
  • #9 If your presentation includes quotes or text in uncommon languages, first read a translation or a paraphrase before you read the original. This way, your audience knows what it will be about and will have enhanced understanding. Thanks to @katharinakager3!
  • #10 Stick to timing (see practice)! Use a timer on your phone etc., so you don’t steal time from others and to allow room for discussion. Pro tip: Consider making your presentation shorter than the allotted time, if the question time is really short and you know this in advance.
  • #11 Avoid dry mouth: Bring your own bottle of NON SPARKLING water. Thanks to @CHPrager!
  • #12 Last but not least, here is a Twitter Thread on how to deal with hostile questions:

Post Presentation Phase

  • #1 Publish your presentation as PDF (preferably PDF/A, Zenodo uploads with a PDF preview get way more interactions!) and additionally whatever format your presentation program of choice produces (also html/xml are interesting formats) in an Open Access Repository such as HAL, Humanities Commons, and Zenodo. Check if the organizers have provided a “community” for the conference and add your presentation there. Don’t forget to include conference metadata to your upload! You can also include speaker’s notes or a separate script of your presentation.
  • #2 Make a blog post of your presentation, nice example by Dot Porter: The Uncanny Valley and the Ghost in the machine, 2018 
  • #3 Some formats like Prezi, Google Docs, and Google Presentations seem to have problems with Disability Access, get informed these issues and choose alternative formats. Read more by Barry Dahl about Accessibility Concerns of Using Prezi in Education (and let me know if the situation has changed)
  • #4 Tweet your published presentation using the conference hashtag and the DOI of your full upload
  • #5 When you tweet your presentation or other new research, more fun for you audience is a so called Twitter Poster. Click this Twitter thread by Mike Morrison and learn how to make them (incl. reusable templates) and click here to learn how to make them accessible for visually impaired users. (Note: Adding a description on any pictures you post also enhances your other tweets.)

For Session and Panel Chairs & Moderators

  • #1 Aim for diversity of your panel. Do you really need to come up with a panel of white man (no #manel!)? Read Barbara Bordalejo, Minority Report 
  • #2 Prepare the presentation of the speakers, especially how to pronounce their names if they are in a language foreign to you. Try to prepare at least one memorable fact about them, but avoid reading a 5 minutes long biography, and use their titles. Pro tip: You can always e-mail them for this information and give a word limit. 
  • #3 Insist on speakers and audience to use the microphone. Read why this matters in Jessie B. Ramey’s blog post A Note from your Colleagues with Hearing Loss: Just use a Microphone already
  • #4 Try to make clear how presenters feel about Social Media, esp. posting photographs of them presenting online and tell results to audience, also specify conference hashtag (session hashtag), I have seen these hashtags already on blackboards, whiteboards etc.  
  • #5 Stick to timing (no. 1)! Don’t allow presenters to steal time from others. Bring aids to help you to stick to timing (cards with minutes left like 2 min, 1 min, siren on your phone, egg timer, etc.) and tell the speakers about your timing policy, e. g. that you will remove the microphone if they go over time, or go over and thank them.
  • #6 Stick to timing (no. 2)! Also, don’t go over time in general and steal time from the break. 
  • #7 Stick to sessions that are planned as slots (e. g. 30 min.), in order to allow the audience switching to other rooms and to allow individual feedback for all speakers. 
  • #8 Prepare a question for each speaker to make sure that all speakers get a question, but incite the audience first to ask. It is important to leave some time for people to gather their thoughts and courage, therefore wait about 30 sec. at least, though this may seem long to you.  
  • #9 Let first question be ask by a woman. Read Maggie Kuo about why this matters ‘Women ask fewer questions than men at conference talks, new studies suggest’. I couldn’t find the source anymore to evidence that suggest that if a woman asks the first question, it encourages other to do likewise, but I often see that once one women has asked a question, others follow en suite.   
  • #10 If you feel bold, comment on overtly long questions that are not questions, but a reminder of the person’s own achievements 😉

For Organizers

  • #1 Think about the environment and accessability: Would it make sense to make your event virtual or allow virtual participation? Check out Diethart, Mario; Zimmermann, Anne; Mulà, Ingrid (2020). Guidelines for Virtual Conferencing… (https://boris.unibe.ch/id/eprint/139254) to get inspired about virtual conferencing or my article about webinars: Wuttke, Ulrike (2019). “The “PARTHENOS eHumanities and eHeritage Webinar Series” … In: LIBER Quarterly, 29(1), pp.1–35. DOI: http://doi.org/10.18352/lq.10257.

Last but not least, these are hacks that I found working for me. Some might work for you and your field, others might not. And yes, I have been there before and will be probably again, making mistakes. And not all hacks suit every individual (institutional) situation. 

As I am not the first one to has been thinking about this topic (obviously! ), here are some great general resources and reads: 

  • Twitter thread on How to Chair:

What do you think? Do you have other hacks and know any more great resources that should be mentioned? Please leave a comment below or tweet!

Fellowship of the Data

“Here be dragons”: Open Access to Research Data in the Humanities

Slightly modified for reading version of my talk for the conference “Innovative Library in Digital Era” (ILIDE) 2019 Conference Jasná, Slowakia, 9.04.2019

Cite as: Ulrike Wuttke, “Here be dragons”: Open Access to Research Data in the Humanities, Blogpost, 09.04.2019, CC-BY 4.0. Link: https://ulrikewuttke.wordpress.com/2019/04/09/open-data-humanities/.

In this rather lengthy blog post (which is more like a pre-print of a future article to be), I discuss the paradigm shift to reusable, machine-readable data as one pillar of Open Science or Open Scholarship (the latter being a more inclusive term for the Arts and Humanities), for Humanities and Heritage researchers. I address key challenges and perspectives of Humanities Research Data Management with a focus on educational aspects and available tools, especially the Research Data Management Organizer (RDMO) tool (https://rdmorganiser.github.io/).
This blogpost was awarded in the Open Humanities Tools and Methods Blog Competition with a travel bursary and a poster presentation at the DARIAH Annual Event 2019 at Warsaw.

Resources:

Introduction

#1 It has been estimated by Stefan Winkler Nees (from the Deutsche Forschungsgemeinschaft-DFG) in 2011 that 90% of all digital research data is lost.[1] We don’t know how many of this data belonged to the Humanities and hopefully, these numbers are better today, but we can assume that still a lot of Humanities data (and other data) is lost, because of missing infrastructures, or because no one has taken care of the long-term availability of this data in time.[2]

Are Your Data FAIR?

But, even if data is not lost: Does the available data sparkle joy, to borrow a term from the ubiquitous Marie Kondo? Are these datasets accessible for research, well documented using standards, available in interoperable formats etc., in short: is it FAIR[3] data, too?

#2 The digital transformation of research has dramatically increased the creation, gathering, and use of research data in all disciplines. It also led to the development of the new paradigm of Open Science which not only promotes that published results should be open access, but also that the underlying data should be open (for example in the H2020 Open Research Data Pilot).[4] Together, these developments resulted in a heightened demand for the publication of research data, promoted by national and international funding agencies[5], and research institutions, often directly reflected in (funding) guidelines, which can be more or less binding.[6] Among the many reasons to underpin the demand for Open Research Data, efficiency (reuse), reproducibility (transparency), impact, and public trust are prominent.[7] Slowly, the responsible handling of research data and FAIR publication of research data is becoming an integral part of Good scientific practice.[8]

The vision goes beyond that: we are talking here about the grand vision of the European Open Science Cloud (EOSC)[9] and an even bigger vision for a global Open Science Ecosystem. To realise this vision and for humanities data playing a considerable role in it, our “efforts should be guided by the twin aims of ensuring that data meets the FAIR principles, and that it is effectively preserved in trusted, certified repositories”[10]. The FAIR principles have gained since their publication in 2016[11] great acceptance as guiding principle, because they take not only into account that not all data can be open[12], leading, for example, the European Research Council to formulate the principle “as open as possible, as closed as necessary”[13], but because they also put great emphasis on “the attributes data need to have to enable and enhance reuse, by humans and machines”, in a nutshell metadata.[14]

Vision for Humanities Data

In the following, I will first line out key challenges related to Open Access to Research Data in the humanities and then discuss some perspectives to improve the current situation. I will focus on educational aspects, and tools for research data management as keystones for Open & FAIR Research Data. As the German situation is best known to me, examples are mainly drawn from a German context.

“Here be dragons”

#4 The phrase “Hic sunt dracones” (transl. “Here be Dragons”), is used on some old maps of the world to describe an area that was unknown to the cartographer. I found it quite appropriate to summarize the ambivalence of humanists towards data and all these “fancy” concepts discussed by “infrastructure people” like FAIR Data, the EOSC, or Research Data Management. First of all, it must be said that humanities researchers tend to be ambivalent about the concept of ‘data’[15] and that “[t]here are issues surrounding […] the acceptance of the ‘research data concept’”[16]. In short, they just don’t use the word “data”, but talk about “sources”, “research materials” etc., which leads to the fact that the whole “data talk” doesn’t appeal to them. Additionally, an expeditionary survey conducted by PARTHENOS in 2017 among researchers in the domain of digital humanities, language studies, and cultural heritage showed that the FAIR Principles and the EOSC, concepts and recommendations, thriving among “infrastructure folks”, are relatively little known in the research communities themselves.[17] Often, the publication of research data only comes as an afterthought (if at all).[18] However, at the end of a project, it is often too late to publish the data in a meaningful way because of the lack of documentation and the lack of resources to prepare the data properly for publishing.

#5 Let me discuss some reasons that currently seem to form barriers for establishing a culture of open sharing in the humanities. In general, “issues surrounding incentivisation”[19] can be observed. Given the strong competition and the traditional humanities reputation system based on traditional ‘long story scientific publication formats’ (monographs, book chapters, or articles as significant scientific publications) as opposed to ‘data’, there is low motivation to publish research data.[20] This lack of incentives goes hand in hand with researchers’ (perceived) fear of being scooped or that someone else will be the first to publish something based on their data, e.g. an exciting manuscript or object they have found.[21] A research tradition that has been based on (and rewarded) secretiveness is not easy to change with nothing fundamental opposite as a reward. Other prejudices often brought up are: nobody will understand the data, nobody will need the data, someone will sell the data and last but not least, (perceived) lack of technical skills.[22]

“The inherent controversy in the meaning of “data” and the importance of personal interpretation on data for humanities researchers is not conducive to sharing.”[23]

#6 Even those who are willing to publish data, as it must also be duly acknowledged that there is a long and thriving tradition of humanities corpora collection and publication continued in the digital, e.g. at the academies, are facing obstacles. Especially legal issues or doubts about possible legal issues are often mentioned in this context: The legal regulations concerning research data are complicated and not internationally aligned and there are many actors involved in the production of humanities research data, not only humanists. This leads to the fact that humanities research is often based on data under copyright restriction (from cultural heritage institutions or other actors) which makes it difficult to publish them as ORD.[24]

#7 Additionally, we are dealing with is issues around the availability and sustainability of specialist support structures for humanities research data support as well as the lack of practical guidance.[25] Humanities data centers and other data services for the humanities are often dependent on third-party funding (project based).[26] This leads to issues of trust on the side of the researchers which may result in these services not being in high demand[27] (and may even result in an unwillingness to “Go Digital” at all), it also leads to the problem of sustaining “living systems”[28] (which need to be frequently updated, migrated, and curated). However, current efforts, especially from the Digital Humanities community, also have led to positive developments:

“For example, the emergence of linked open data over the past decade has been supported both by the establishment of effective standards for modeling and disseminating such data, and the growth of practices and social expectations supporting its creation. These developments have meant that expertly modeled data from specific domains can be accessed and combined flexibly, rather than remaining in isolation or striving for self-sufficiency.”[29]

Fellowship of the Data
Fellowship of the Data

#8 To sum up my observations so far: Humanities research data, in general, is rather heterogeneous, idiosyncratic, and complex[30] and humanists are ambivalent about the term “data”. Digital practices are already part of the research activities of many humanists, especially in the Digital Humanities, but they are not equally fully developed. This leads to the fact that the potential of digital research data and methods is not fully exploited, because the digital research process is not carefully planned, with other words many research data already exist in digital form, but they are not findable, quality controlled, and reusable.[31] All in all, the land of FAIR Research Data is still unknown territory for many humanists, or at least scary as if dragons would indeed live there. In the next part, therefore, I will argue for increased efforts for awareness raising and skills building and a “fellowship of the data”, a support system to facilitate the quest for FAIR data in the humanities.

Perpectives

#9 Naturally, I cannot offer immediate solutions, but I would like to point out some paths that need to be pursued with increased intensity in the near future to facilitate FAIR data in the humanities (and beyond). These paths can focus on different stakeholders in the FAIR ecosystem such as research institutions, funding bodies, or publishers, or individual researchers and research communities. In my opinion, special attention has to be paid to the researchers and research communities themselves, so that recommendations, policies, services, etc. are aligned and known to disciplinary practices and cultures.[32] If the researchers are not on our side in this quest, we are prone to lose the battle or at least experience a delay in realising the goals.

#10 The most urgent points in my opinion are the following. We need to work on incentives for the FAIR publication of research data, e.g. wider adoption of DORA (Declaration on Research Assessment).[33] We also need to Invest in the development of beneficial environments for aggregation (think EOSC, German NFDI):

“The interdisciplinary bundling of humanities data repositories and the development of adequate research tools and services for linked data represents a great opportunity for humanities research.”[34]

Another highly important task is educating the next (and this) generation of (digital) humanities researchers[35] to deal with the datafication[36] of research and education practices, but also “infrastructure people” in discipline-specific contexts. Humanists need to be aware of limits of publication tools and how they expose (or not) the underlying data model. We need to strive to offer more idiosyncratic options and take time for critical consideration of how early choices have an impact on how data is published and can be used (curatorial perspective). It is our task to prevent research data from becoming trapped (in specific formats or hardware). In an ideal world we “design our data without tool dependencies” = “tool-agnostic approach” vs. “tool-dependent approach”[37]. The progressing digital transformation of humanities research along with the increasing importance of digital research infrastructures calls not only for a certain level of “data literacy”[38] but even for an expansion of this concept to a certain level of “data infrastructure literacy”, a term recently coined by Gray et al. 2018[39].

Much discussed in the context I have just outlined is Research Data Management:

“Research Data Management describes the process to curate (or manage) research data along the research data lifecycle and includes various activities such as planning, producing, selection, analysis, archiving, and preparation for reuse. Because data are very heterogeneous, discipline and data specific solutions can be required.”[40]

Given this, admittedly not very appealing sounding definition, it may not come as a surprise that researchers in general still consider research data management [41]as an extra, tedious, time-consuming task that diverts them from “real research” and humanists especially consider it as almost opposed to the hermeneutic humanistic research practice[42]. However, acceptance by the researchers is the key success factor for establishing standards in Research Data Management.[43] Therefore, we need to show humanists the added value of Research Data Management for the planning process of digital projects. Lemaire (2018) has recently convincingly argued that RDM is a process already inherent in the research process itself (although at the moment rather implicit) and that it can be an instrument for reviewing the research concept.[44]

#12 The digital turn of humanities research and the requirement for FAIR research data, that is sustainable and quality controlled handling of research data, requires thorough planning of the digital research process before the start, a process that is guided and documented in a Research Data Management Plan.[45] Given all the advantages of Research Data Management planning, efforts should be increased regarding awareness raising and skills building[46] already as part of university curricula[47][, and tools for consequent data management planning (and handling)[48] with the end in mind, that is (if possible) the publication of FAIR research data.

RDMO[49]

#13 Given that the management of research data is increasingly regarded as a process of active support and care during the whole research process (and not only of producing a mere static document), tools are needed that support active Research Data Management Planning[50], e.g. by providing different and up-to-date status information to different participants of the research process. I am currently part of a project that is developing such a tool, the Research Data Management Organiser (RDMO).[51]

#14 In a nutshell, with RDMO the research data management process can be organised as a collaborative effort encompassing its different stakeholders, besides researchers, especially infrastructure partners such as libraries or computing centres. One of the use scenarios of RDMO is library staff using RDMO’s question catalogue to work out the data management strategy for projects with researchers and other partners/experts.[52] RDMO can be adapted to the requirements of communities or organisations (e.g. institutional or discipline-specific guidelines) and has multilingual capabilities. At the moment we are working with a range of very different institutions and communities in Germany to further improve this tool.

Conclusion

#15 On the one hand, researchers need to be aware of the issues at hand and take their responsibility (take the issues of digital research practices seriously!).[53] On the other hand, we need to work on institutional availability and sustainability of research data (management) and support (e.g. via data centres and local experts)[54] and clever and efficient connection between initiatives on different levels. We need to provide adequate RDM tools that support researchers to prepare their data for publication from the very beginning. Libraries already have an important role in this ecosystem and I dare to say that they are capable and designated to take a leading role in this field in the future if they invest in it: in heads and infrastructure(!).[55] Especially I would like to refer here to their role in the creation of vast digital corpora of open research data, which cannot be left as a task to researchers alone.[56] Last but not least, we also need to make research data management less scary: it’s not a scientific revolution and doesn’t mean that all skills learned so far in a typical humanities curriculum are to be thrown overboard, quite the opposite is true.[57]

#16 There is a lot at stake for the humanities, maybe the very question what we want the future of the humanities to be. When it comes to Open and FAIR research data in the humanities, I can only say it with Queen: “I want it all, and I want it now!”

Queen I want it all

“I want it all, I want all, I want it all, and I want it now.” Queen (1989).


To create this broad culture of FAIR data sharing in the humanities we have to roll up our sleeves, team up, and distribute hats:

  1. Embrace Open principles,
  2. bridge the gap between the digital and the humanities and look what we can learn from the Digital Humanities and other more data-savvy disciplines.[58]

What are your thoughts and suggestions on this topic? Do you agree? Do you have additional hints for me that lead to more discipline specific information and insight about the handling of research data, data sharing, and the FAIR principles in the humanities? Please, leave your thought below or contact me via Twitter or e-mail. Looking forward to discuss this topic further with you!

Notes

[1] See Stefan Winkler-Nees, Vorwort, In: Büttner, Stephan; Hobohm, Hans-Christoph; Müller, Lars (ed.): Handbuch Forschungsdatenmanagement, Bad Honnef 2011, p. 5.

[2] See Geisteswissenschaftliche Datenzentren im deutschsprachigen Raum, Grundsatzpapier zur Sicherung der langfristigen Verfügbarkeit von Forschungsdaten, Version 1.0, DHd AG Datenzentren, 2018, p. 9. Accessed from http://doi.org/10.5281/zenodo.1134760, 25.03.2019.

[3] See Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. Accessed from https://doi.org/10.1038/sdata.2016.18, 25.03.2019. Webseite FORCE11: https://www.force11.org/group/fairgroup/fairprinciples.

[4] See European Research Council (ERC), Guidelines on Implementation of Open Access to Scientific Publications and Research Data in projects supported by the European Council under Horizon 2020, Version 1.1., 21. April 2017. Accessed from http://ec.europa.eu/research/participants/data/ref/h2020/other/hi/oa-pilot/h2020-hi-erc-oa-guide_en.pdf, 25.03.2019. See also from the UK document Concordat on Open Research Data, 2016. Accessed from https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/, 25.03.2019 or the recommendation from the Steuerungsgremium der Schwerpunktinitiative „Digitale Information“ der Allianz der deutschen Wissenschaftsorganisationen (2017), Den digitalen Wandel in der Wissenschaft gestalten: Die Schwerpunktinitiative „Digitale Information“ der Allianz der deutschen Wissenschaftsorganisationen, Leitbild 2018 – 2022, 2017, Accessed from: http://doi.org/10.2312/allianzoa.015.

[5] See for example EC Directorate-General for Research & Innovation, H2020 Programme: Guidelines on FAIR Data Management in Horizon 2020, Version 3.0, 26. July 2016, Accessed from: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf, 25.03.2019, Deutsche Forschungsgemeinschaft /DFG), Leitlinien zum Umgang mit Forschungsdaten, 30.09.2015. Accessed from: http://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/richtlinien_forschungsdaten.pdf, 25.03.2019.

[6] While German funders already give recommendations, other funders already issued more binding guidelines that include the demand for a DMP, such as the Schweizerische Nationalfonds (SNF). See Schweizerische Nationalfonds, Open Research Data, (o.J.). Accessed from: http://www.snf.ch/en/theSNSF/research-policies/open_research_data/Pages/default.aspx#FAIR%20Data%20Principles%20for%20Research%20Data%20Management, 25.03.2019.

[7] “Open research data (ORD) have the potential not only to deliver greater efficiencies in research, but to improve its rigour and reproducibility, to enhance its impact, and to increase public trust in its results.” Open Research Data Task Force, Realising the potential: Final Report of the Open Research Data Task Force. N.P., 2018, p. 3. Accessed from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775006/Realising-the-potential-ORDTF-July-2018.pdf, 22.03.2019.

[8] See recommendation 7 “Sicherung und Aufbewahrung von Primärdaten” of the DFG-Denkschrift zur Sicherung der guten wissenschaftlichen Praxis, in: Deutsche Forschungsgemeinschaft (DFG), Sicherung guter wissenschaftlicher Praxis: Empfehlungen der Kommission „Selbstkontrolle in der Wissenschaft“, Weinheim 2013, p. 21-22. Accessed from: http://doi.org/10.1002/9783527679188.oth1, 25.03.2019.

[9] https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud.

[10] Open Research Data Task Force, Realising the potential: Final Report of the Open Research Data Task Force. N.P., 2018, p. 8. Accessed from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775006/Realising-the-potential-ORDTF-July-2018.pdf, 25.03.2019.

[11] The term FAIR was launched in 2014. See https://www.force11.org/group/fairgroup/fairprinciples.

[12] Principles for Open Data in Science have been formulated in the Panton Principles, demanding that data should be placed in public domain. See Panton Principles, Principles for open data in science. Murray-Rust, Peter; Neylon, Cameron; Pollock, Rufus; Wilbanks, John; (19 Feb 2010). Retrieved 31.03.2019 from https://pantonprinciples.org/. This demand has been criticized as difficult to realize because of two main reasons 1) the legal system of some countries, including Germany, does not really allow complete renunciation of rights by the right holder (i.e. public domain) 2) it removes all obligations to quote, which remove an important incentive, see Ulrich Herb, Open Science in der Soziologie. Eine interdisziplinäre Bestandsaufnahme zur offenen Wissenschaft und eine Untersuchung ihrer Verbreitung in der Soziologie, Glückstadt 2015, p. 126, accessed from: https://doi.org/10.5281/zenodo.31234, 25.03.2019.

[13] European Research Council (ERC), Guidelines on Implementation of Open Access to Scientific Publications and Research Data in projects supported by the European Council under Horizon 2020, Version 1.1., 21. April 2017, p. 6. Accessed from http://ec.europa.eu/research/participants/data/ref/h2020/other/hi/oa-pilot/h2020-hi-erc-oa-guide_en.pdf, 25.03.2019. Not all data needs to be published. A selection of what data is relevant and interesting for scientific reuse, like archives have always done, is necessary. For guidelines see for example  Angus Whyte & Andrew Wilson, How to appraise and select research data for curation, Digital Curation Centre How-to Guides, Edinburgh 2010. Accessed from: http://www.dcc.ac.uk/resources/how-guides/appraise-select-data, 25.03.2019.

[14] European Commission Expert Group on FAIR Data, Turning FAIR into reality: Final Report and Action Plan from the European Commission Expert Group on FAIR Data, Brussels 2018, p. 18, accessed from: https://ec.europa.eu/info/publications/turning-fair-reality_en, 25.03.2019. The FAIR principles provide guidance on “how to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows” Quote from: https://www.force11.org/group/fairgroup/fairprinciples, accessed: 25.03.2019.

[15] See for example Christof Schöch, Big? Smart? Clean? Messy? Data in the Humanities, in: Journal of Digital Humanities, 2 (2013): 3. Accessed from: http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/, 25.03.2019, Miriam Posner, Humanities Data: A Necessary Contradiction | Miriam Posner’s Blog, 25.06.2015. Accessed from: http://miriamposner.com/blog/humanities-data-a-necessary-contradiction/, 25.03.2019. Insightful discussions of the complexity of talking about data in the Humanities are also offered for example by Jennifer Edmond, Georgina Nugent-Folan, D2.1 Redefining what data is and the terms we use to speak of it, KPLEX (Knowledge Complexity) Deliverable D2.1, 2018. Accessed from: https://kplexproject.files.wordpress.com/2018/07/d2-1-redefining-what-data-is-and-the-terms-we-use-to-speak-of-it.pdf, 25.03.2019, Fabian Cremer, Lisa Klaffki, Timo Steyer, T., Der Chimäre auf der Spur: Forschungsdaten in den Geisteswissenschaften. o-bib 5 (2018): 2, p. 142–162. Accessed from: https://doi.org/10.5282/o-bib/2018H2S142-162, 25.03.2019.

[16] Open Research Data Task Force, Case Studies: Annex to the final report of the Open Research Data Task Force, 2018, p. 7. Accessed from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775379/Case-studies-ORDTF-July-2018.pdf, 25.03.2019.

[17] See PARTHENOS, The FAIR principles and the EOSC concept in the research community of Digital Humanities, Language Studies and Cultural Heritage: An expeditionary survey, 2017, esp. p. 7-8, Accessed from: http://www.parthenos-project.eu/Download/PARTHENOS_FAIR_EOSC_survey.pdf, 25.03.2019.

[18] See Elke Brehm & Janna Neumann, Anforderungen an Open-Access-Publikationen von Forschungsdaten: Empfehlungen für einen offenen Umgang mit Forschungsdaten, in: o-bib 2018: 3, p. 1-16, p. 3. Accessed from: https://doi.org/10.5282/o-bib/2018H3S1-16, 25.03.2019.

[19] See for example Open Research Data Task Force, Case Studies: Annex to the final report of the Open Research Data Task Force, 2018, p. 7. Accessed from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775379/Case-studies-ORDTF-July-2018.pdf, 25.03.2019.

[20] The humanities are no exception here. See for example Benedikt Fecher, Cornelius Puschmann, Über die Grenzen der Offenheit in der Wissenschaft: Anspruch und Wirklichkeit bei der Bereitstellung und Nachnutzung von Forschungsdaten, in: Information – Wissenschaft & Praxis 66 (2015): 2-3, p. 146-150, p. 147. Accessed from: https://doi.org/10.1515/iwp-2015-0026, 25.03.2019, Fabian Cremer, Lisa Klaffki, Timo Steyer, T., Der Chimäre auf der Spur: Forschungsdaten in den Geisteswissenschaften. o-bib 5 (2018): 2, p. 142–162, p. 148. Accessed from: https://doi.org/10.5282/o-bib/2018H2S142-162, 25.03.2019.

[21] See for example Christine L. Borgman, Big Data, Little Data, No Data: Scholarship in the networked World. Cambridge, Mass; London, 2015, p. 177-179, where she characterises humanities data as often being considered as “club goods” (p. 177), meaning access is only granted to very specific individuals, such as local researchers. She describes on the example of the Dead Sea Scrolls this practice of local control (“hoarding”, p. 178), which stems from the fact: “Once scholars obtain access to materials, they may wish to mine the in private until they are ready to publish.” (p. 178).

[22] See for general observations about the (not) sharing of research data for example Carol Tenopir et al., Data Sharing by Scientists: Practices and Perceptions, in: PLOS ONE 6 (6), 29.06.2011, p. e21101, https://doi.org/10.1371/journal.pone.0021101,Veerle van den Eynden & Libby Bishop, Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report, 2014, accessed from: http://repository.jisc.ac.uk/5662/1/KE_report-incentives-for-sharing-researchdata.pdf, 25.03.2019, Benedikt Fecher et al., What Drives Academic Data Sharing? PLOS ONE, 10(2015):2, p. e0118053. Accessed from: https://doi.org/10.1371/journal.pone.0118053, 25.03.2019, Ulrich Herb, Open Science in der Soziologie. Eine interdisziplinäre Bestandsaufnahme zur offenen Wissenschaft und eine Untersuchung ihrer Verbreitung in der Soziologie, Glückstadt 2015, p. 134-143, accessed from: https://doi.org/10.5281/zenodo.31234, 25.03.2019, Ben Kaden, Warum Forschungsdaten nicht publiziert werden, in: LIBREAS.Dokumente, LIBREAS.Projektberichte, 13.03.2018, accessed from https://libreas.wordpress.com/2018/03/13/forschungsdatenpublikationen/, 25.03.2019.

[23] Open Research Data Task Force, Case Studies: Annex to the final report of the Open Research Data Task Force, 2018, p. 6. Accessed from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775379/Case-studies-ORDTF-July-2018.pdf, 25.03.2019.

[24] Concerning prevalent legal issues see for example Madeleine de Cock Buning et al., The legal status of research data in the  Knowledge Exchange partner countries, 2011. Accessed from: http://repository.jisc.ac.uk/6280/, 25.03.2019, Bastian Drees, Text und Data Mining: Herausforderungen und Möglichkeiten für Bibliotheken. Perspektive Bibliothek, 5(2016:1, p. 49–73, esp. p. 59-61. Accessed from: http://dx.doi.org/10.11588/pb.2016.1.33691, Elke Brehm & Janna Neumann, Anforderungen an Open-Access-Publikationen von Forschungsdaten: Empfehlungen für einen offenen Umgang mit Forschungsdaten, in: o-bib 2018: 3, p. 1-16, p. 8-11. Accessed from: https://doi.org/10.5282/o-bib/2018H3S1-16, 25.03.2019, Anne Lauber-Rönsberg, Philipp Krahn, Paul Baumann, Gutachten zu den rechtlichen Rahmenbedingungen des Forschungsdatenmanagements im Rahmen des DataJus-Projekts (Kurzfassung), 2018. Accessed from: https://tu-dresden.de/gsw/jura/igewem/jfbimd13/ressourcen/dateien/publikationen/DataJus_Kurzfassung_Gutachten_12-07-18.pdf?lang=de&set_language=de, 25.03.2019.

[25] “The time and effort required to make research data open and accessible in accordance with the FAIR principles (Findable, Accessible, Interoperable, Re-usable) can be considerable; and those researchers who are keen to adopt ORD practices may find themselves stymied by a lack of practical guidance and specialist support.” Open Research Data Task Force, Realising the potential: Final Report of the Open Research Data Task Force. N.P., 2018, p. 23. Accessed from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775006/Realising-the-potential-ORDTF-July-2018.pdf, 25.03.2019. This report acknowledges that some materials have already been developed (p. 24), which are from the author’s perspective often too general,  (UW often too general), and calls to increase efforts for training and education (p. 24).

[26] See for example Rat für Informationsinfrastrukturen (RfII), Leistung aus Vielfalt: Empfehlungen zu Strukturen, Prozessen und Finanzierung des Forschungsdatenmanagements in Deutschland, 2016, p. 37-39, Accessed from http://www.rfii.de/?wpdmdl=1998, 25.03.2019, DHd AG Datenzentren, Geisteswissenschaftliche Datenzentren im deutschsprachigen Raum:  Grundsatzpapier zur Sicherung der langfristigen Verfügbarkeit von Forschungsdaten (Version 1.0). DHd AG Datenzentren, 2018, p. 24. Accessed from http://doi.org/10.5281/zenodo.1134760, 25.03.2019.

[27] See Open Research Data Task Force, Case Studies: Annex to the final report of the Open Research Data Task Force, 2018, p. 7. Accessed from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775379/Case-studies-ORDTF-July-2018.pdf, 25.03.2019.

[28] “Lebende Forschungsanwendungen spielen in den Geisteswissenschaften eine zunehmend große Rolle in der digitalen Ergebnissicherung und -präsentation. Im Gegensatz zur Buchpublikation ist jedoch die dauerhafte Erhaltung, Betreuung und Bereitstellung dieser lebenden Systeme eine technische und organisatorische Herausforderung. Während es vergleichsweise einfach möglich ist reine Forschungsdaten in Datenarchiven für die Nachwelt zu konservieren, sind lebende Systeme Teil eines digitalen Ökosystems und müssen sich diesem, z.B. in Form von Updates, regelmäßig anpassen.“ Andreas Witt et al., Forschungsdatenmanagement in den Geisteswissenschaften an der Universität zu Köln, in: o-bib 5 (2018): 3, p. 104–117, p. 111. Accessed from https://doi.org/10.5282/o-bib/2018h3s104-117, 25.03.2019. See also the website of the Project SustainLife at Cologne University: https://dch.phil-fak.uni-koeln.de/sustainlife.html, accessed 25.03.2019.

[29] Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, here p. 20.

[30] See Open Research Data Task Force, Case Studies: Annex to the final report of the Open Research Data Task Force, 2018, p. 6. Accessed from: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775379/Case-studies-ORDTF-July-2018.pdf, 25.03.2019.

[31] See Marina Lemaire, Vereinbarkeit von Forschungsprozess und Datenmanagement in den Geisteswissenschaften: Forschungsdatenmanagement nüchtern betrachtet. o-bib 5 (2018):4, p. 237–247, here p. 237-238. Accessed from: https://doi.org/10.5282/o-bib/2018H4S237-247, 25.03.2019.

[32] See Open Research Data Task Force, Realising the potential: Final Report of the Open Research Data Task Force. N.P., 2018, p. 23. Accessed from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775006/Realising-the-potential-ORDTF-July-2018.pdf, 22.03.2019.

[33] The DORA declaration recommends to give credit for more than only articles, for example also for data sets and software, see: https://sfdora.org/read/, accessed 25.03.2019.

[34] Original: “In der interdisziplinären Bündelung geisteswissenschaftlicher Datenrepositorien und der Entwicklung adäquater Forschungswerkzeuge und -dienste für verknüpfte Daten liegt eine große Chance für die geisteswissenschaftliche Forschung.“ Torsten Schrade, Im Datenozean, in: F.A.Z., 2.12. 2018, Accessed from: https://www.faz.net/-in2-9h3jj, 25.03.2019.

[35] During the last years increasingly attention is being paid to Digital Humanities pedagogy and the development of specific Digital Humanities curricula. For Digital Humanities pedagogy see for example David B. Hirsch (ed.), Digital Humanities Pedagogy: Practices, Principles and Politics, 2012. Accessed from: http://www.openbookpublishers.com/reader/161, 25.03.2019, or Matthew K. Gold,  Debates in the Digital Humanities. Minneapolis, 2012, Section V. Accessed from: http://dhdebates.gc.cuny.edu/debates/1. For curricula see for example Patrick Sahle, DH studieren! Auf dem Weg zu einem Kern- und Referenzcurriculum der Digital Humanities. Göttingen: GOEDOC, Dokumenten- und Publikationsserver der Georg-August-Universität, 2013. Accessed from http://webdoc.sub.gwdg.de/pub/mon/dariah-de/dwp-2013-1.pdf, or IANUS, Statement zu minimalen IT-Kenntnissen für Studierende der Altertumswissenschaften, 2017. Accessed from https://www.ianus-fdz.de/projects/ausbildung_qualifizierung/wiki/Empfehlungen_zu_minimalen_IT-Kenntnissen, 25.03.2019. The need to not only being able to use tools and modeling systems, but to be able “to intervene in this ecology by designing more expressive modeling systems, more effective tools, and a compelling pedagogy through which colleagues and new scholars can gain an expert purchase on these questions as well” has been underlined recently by Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, here p. 20. See also ibid. p. 23. Data literacy, including data modelling literacy is indispensable to exercise control “over our data during its entire life cycle”, ibid. p. 12.

[36] The term datafication seems to have been coined in the publication by Kenneth Neil Cukier & Viktor Mayer-Schoenberger, The Rise of Big Data: How It’s Changing the Way We Think About the World. Foreign Affairs (2013). Accessed from https://www.foreignaffairs.com/articles/2013-04-03/rise-big-data, 25.03.2019.

[37] For this aspect see Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, esp. p. 13-15, quotes p. 14 and p. 15.

[38] “Die zunehmende Digitalität in den Geisteswissenschaften macht dabei den Aufbau einer Data Literacy, also einer grundlegenden Datenkompetenz von Lernenden, Lehrenden und Forschenden, unerlässlich.” Torsten Schrade, Im Datenozean, in: F.A.Z., 2.12. 2018, Accessed from: https://www.faz.net/-in2-9h3jj, 25.03.2019.

[39] See Jonathan Gray, Carolin Gerlitz & Liliana Bonegru, Data infrastructure literacy. Big Data & Society (2018), p. 1–13. Accessed from: https://doi.org/10.1177/2053951718786316, 25.03.2019.

[40] Translated from: AG Forschungsdaten der Schwerpunktinitiative “Digitale Information” der Allianz der deutschen Wissenschaftsorganisationen, Forschungsdatenmanagement: Eine Handreichung, 2018, p. 4. Accessed from: http://doi.org/10.2312/allianzoa.029, 25.03.2019

[41] Sometimes the term data curation seems to be used (wrongly) as a synonym.

[42] See Marina Lemaire, Vereinbarkeit von Forschungsprozess und Datenmanagement in den Geisteswissenschaften: Forschungsdatenmanagement nüchtern betrachtet. o-bib 5 (2018):4, p. 237–247, here p. 238. Accessed from: https://doi.org/10.5282/o-bib/2018H4S237-247, 25.03.2019.

[43] See Andreas Witt et al., Forschungsdatenmanagement in den Geisteswissenschaften an der Universität zu Köln, in: o-bib 5 (2018): 3, p. 104–117, here p. 106. Accessed from https://doi.org/10.5282/o-bib/2018h3s104-117, 25.03.2019.

[44] See Marina Lemaire, Vereinbarkeit von Forschungsprozess und Datenmanagement in den Geisteswissenschaften: Forschungsdatenmanagement nüchtern betrachtet. o-bib 5 (2018):4, p. 237–247, here p. 244-245. Accessed from: https://doi.org/10.5282/o-bib/2018H4S237-247, 25.03.2019. See also: “The point here is not that these costs are prohibitive or unjustified, but rather that good strategic planning involves balancing the costs and benefits, and focusing the effort in areas that offer a clear advantage.” Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, p. 8.

[45] See Marina Lemaire, Vereinbarkeit von Forschungsprozess und Datenmanagement in den Geisteswissenschaften: Forschungsdatenmanagement nüchtern betrachtet. o-bib 5 (2018):4, p. 237–247, here p. 245 (accessed from: https://doi.org/10.5282/o-bib/2018H4S237-247, 25.03.2019), who describes that the biggest difference of the digitized research process in the humanities is that researchers need to plan the research process more detailed at an earlier stage, describing their methods more explicit in order to come to machine readable data (processes).

[46] See Marina Lemaire, Vereinbarkeit von Forschungsprozess und Datenmanagement in den Geisteswissenschaften: Forschungsdatenmanagement nüchtern betrachtet. o-bib 5 (2018):4, p. 237–247, here p. 245-246, accessed from: https://doi.org/10.5282/o-bib/2018H4S237-247, 25.03.2019.

[47] See for example Andreas Witt et al., Forschungsdatenmanagement in den Geisteswissenschaften an der Universität zu Köln, in: o-bib 5 (2018): 3, p. 104–117, here p. 113. Accessed from https://doi.org/10.5282/o-bib/2018h3s104-117, 25.03.2019. The authors describe that Research Data Management is already established in the curriculum of the humanities faculty of Cologne University.

[48] “There is a need for new guidance and exemplars to ensure that data meets appropriate quality standards; for tools to standardise and automate data management, documentation and curation processes; and for an increased focus on improving research software, and on recruiting and retaining software engineers.” Open Research Data Task Force, Realising the potential: Final Report of the Open Research Data Task Force. N.P., 2018, p. 8. Accessed from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775006/Realising-the-potential-ORDTF-July-2018.pdf, 22.03.2019.

[49] About RDMO see esp. the detailed article Heike Neuroth et al., Aktives Forschungsdatenmanagement. ABI Technik, 38(2018):1, p. 55–64. Accessed from https://doi.org/10.1515/abitech-2018-0008, 25.03.2019. See also the RDMO project website: https://rdmorganiser.github.io/, accessed 25.03.2019.

[50] There are several tools available for helping creating a Research Data Management Plan (DMP), but not with a focus on active data management.

[51] My home organisation, the University of Applied Sciences Potsdam (FHP), is currently developing together with the project partners AIP (Leibniz-Institut für Astrophysik Potsdam) and KIT (Karlsruhe Institute of Technology) funded by the Deutsche Forschungsgemeinschaft (DFG) such a tool, the Research Data Management Organiser (RDMO).

[52] Research Data Management should not be a sole task for researchers, but they definitely have to be on board.

[53] “As data creators, academics have a different, more knowing relationship to their data: they create data that is going to be a persistent part of the research environment, and they act as both its creators, managers, and consumers. The stakes of the modeling decisions for research data are thus much higher, and to the extent that these decisions are mediated through tools, there is  significant value—even a burden of responsibility—in understanding that mediation. And within the academy, the stakes for digital humanists are highest of all, since their research concerns not only the knowing and critical use of data models, media, and tools, but also their critical creation.” Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, here p. 11-12.

[54] For the recommendation of local contact persons see for example, Andreas Witt et al., Forschungsdatenmanagement in den Geisteswissenschaften an der Universität zu Köln, in: o-bib 5 (2018): 3, p. 104–117, esp. p. 106, 115.

[55] See Paul Ayris et al., LIBER Open Science Roadmap, 2018, p. 18-19. Accessed from:  http://doi.org/10.5281/zenodo.1303002, 25.03.2019.

[56] This point especially relates to the creation of corpora that are digitized and made accessible in meaningful ways for research purposes, e.g. HathiTrust Digital Library (https://www.hathitrust.org/) or the Deutsche Textarchiv (DTA) (http://www.deutschestextarchiv.de/). See for example the recommendations in Lisa Klaffki, Stefan Schmunk, Thomas Stäcker, Stand der Kulturgutdigitalisierung in Deutschland: Eine Analyse und Handlungsvorschläge des DARIAH–DE Stakeholdergremiums „Wissenschaftliche Sammlungen“, Göttingen: GOEDOC, Dokumenten- und Publikationsserver der Georg-August-Universität, 2018 (DARIAH-De Working Papers, 26), Accessed from: http://webdoc.sub.gwdg.de/pub/mon/dariah-de/dwp-2018-26.pdf, 04.04.2019.

[57] See Marina Lemaire, Vereinbarkeit von Forschungsprozess und Datenmanagement in den Geisteswissenschaften: Forschungsdatenmanagement nüchtern betrachtet. o-bib 5 (2018):4, p. 237–247, here p. 245-246, Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, here p. 3, Torsten Schrade, Im Datenozean, in: F.A.Z., 2.12. 2018, Accessed from: https://www.faz.net/-in2-9h3jj, 25.03.2019.

[58] See for example Julia Flanders & Fotis Jannidis, Data Modeling in a digital humanities context: An introduction. In: Julia Flanders & Fotis Jannidis (ed), The shape of data in the digital humanities: Modeling texts and text-based resources, London, New York, 2019, p. 3–25, here p. 5.

The text of this blog post is published under the license CC-BY 4.0.

Cite as: Ulrike Wuttke, “Here be dragons”: Open Access to Research Data in the Humanities, Blogpost, 09.04.2019, CC-BY 4.0. Link: https://ulrikewuttke.wordpress.com/2019/04/09/open-data-humanities/.

Libraries as the Third Wheel? Integrating Digital Humanities and Libraries (a data science perspective)

Last week I had the honour to be invited by Michiel Cock, Esther ten Dolle and Steven Claeyssens as a panel member to their very timely round table discussion at DHBenelux 2018 about “Integrating libraries and Digital Humanities “. My fellow panelist were Hilde De Weerdt, Max Kemman, and Sally Chambers. Each of us was asked to prepare a short statement to reflect on the libraries role in the Digital Humanities.

There is definitely a lot to say about the libraries role in the Digital Humanities as the full room and the heated discussion that was fueled by controversial statements made up by the organizers of the Round Table for the audience to vote on via Kahoot (what a great idea, note to myself!) showed. While there was a tendency to assign to libraries an omnipresent role (because there were many librarians in the room?) there was also the perspective, drawn from Max Kemman’s research, that libraries are more like a third wheel in the Digital Humanities (Note: Max left the room alive!).

We all agreed that more research should be published Open Access. This is definitely an area where libraries a destined to play an important role! While, like so often, I am convinced that the answer is somewhere in the middle and libraries are not the Third wheel, but a partner of the Digital Humanities, I would like to share my 3 min. statement that I was asked to prepare on “support for DH: the library as a center for expertise on data science”? Once the organizers share their wrap up of the discussion, I will share the link too.

 

Screen Shot 2018-06-10 at 10.55.30

My statement “support for DH: the library as a center for expertise on data science?”

Preamble: What is Data Science? There are many definitions, I go along with:

“Data science exists on a spectrum and can span work that requires deep statistical and software engineering skills, to work focusing on advocacy, policy development, data management planning, and evidence-based decision making.” (Matt Burton, Liz Lyon, Chris Erdmann, Bonnie Tijerina, Shifting to Data Savvy: The Future of Data Science In Libraries, 2018, p. 6)

Data Science is undeniably at the heart of the Digital Humanities. So, what is the library’s role related to DH and Data Science? In my opinion the DH and libraries are destined to be “partners in crime” related to Data Science. I see this as a continuation of the traditional role of libraries adapted to the requirements of the digital transformation. Libraries can (and often are already) playing a role in DH as centres for Data Science along the phases of the research (data) lifecycle. They:

  • provide data fit for computational use of cultural heritage collections (collections as data paradigm) that are open: technical aspects and paradigm change
  • provide tools and infrastructure
  • support for Research Data Management (Planning)
  • are involved in Data Science education (courses, trainings, and training materials for researchers, involvement in teaching curriculum) > standards
  • consult on principles, methods, and tools

To sum up: libraries and librarians role related to Data Science lays:

  1. in the area of initial consultancy about concepts, methods and tools
  2. in providing skills, know how and tools/infrastructure to support researchers

These forms of support can be institutionalised as data librarians, collaboration on projects, or libraries can be facilitators or partners of Digital Humanities centres.

However, there are some clouds on this clear blue sky which I would like to address …

  • There are changes/adjustments necessary on the side of the library : we have to pay attention to the skills and knowledge that librarians need to fulfil THEIR role in the DH: Coming from an institution that educates future librarians, first of all I think here of the integration of new topics and skills into education curricula of librarians (bachelor and master degree level) as well as continued DH and data science training for librarians (e.g. Trainings to embrace Open principles, technical upskilling) (last but not least, give them time for education and provide suitable opportunities!)
  • Closely related to this topic is the need for a closer integration of libraries into the DH, round tables like this are a clear sign that this is changing.
  • Last but not least, I would like to stress that the library is not alone in this task (and doesn’t have to do it all!): Seek and act in collaboration with Computing Centres etc. and bundle the available data science competencies at the library and beyond in some kind of virtual DH centre and “advertise/communicate” them to the scholarly community, this is often already a great improvement!

Screen Shot 2018-06-10 at 09.32.00

Leipzig und DH: Impressionen

Summary: A lot has changed in the Humanities since I had my first academic job in the context of an edition project at Leipzig University. The ongoing digital transformation of all humanities disciplines asks for more self-reflection on methodologies and early as well as life long training. Leipzig is with the European Summer University in Digital Humanities and other important DH activities and actors a DH hot spot and therefore was a very fitting place for a presentation of the PARTHENOS Training Suite. 

Von 2003-2005 hatte ich meine erste Stelle als Wissenschaftliche Mitarbeiterin an der Niederlandistik der Universität Leipzig und bearbeitete dort zwei Editionen mittelniederländischer Texte. Diesen Juli hatte ich endlich wieder die Gelegenheit nach Leipzig zurückzukehren. Das letzte Mal, die DHd-Konferenz in Leipzig (2016), die ich noch immer in sehr guter Erinnerung habe – auch wegen meines ersten Besuchs im legendären Faustischen Auerbachs Keller (!), aber das nur am Rande – war inzwischen schon wieder eine Weile her. Der Anlass war die Einladung im Rahmen der European Summer University in Digital Humanities 2017 die PARTHENOS Training Suite zu präsentieren. Das GWZ in der Beethovenstraße, mein alter Dienstort, gibt es zwar noch immer, aber vieles hat sich verändert, an der Uni Leipzig und in den Editionswissenschaften. Genau der richtige Hintergrund für eine persönliche Reflektion.

IMG_20170719_124518

ESU 2017 Poster

Im Jahr 2003 steckten die digitalen Editionswissenschaften und vor allem die TEI (Text Encoding Initiative) noch in den Kinderschuhen, bzw. waren noch weit entfernt von dem enormen methodologischen Einfluss, den sie danach nehmen sollten (Link: Geschichte der TEI). Es stand natürlich auch damals außer Frage, dass die mittelniederländischen Editionen digital gemacht werden sollten. Aber digital bedeutete im Rahmen des Projekts mit Hilfe eines Textverarbeitungsprogramms, nicht mit XML-Editoren, oder dem damals noch stark verbreiteten, aber für die meisten Anwendungsfälle viel zu komplexen TUSTEP.

Die Editionen der beiden mittelniederländischen Texte sind schon lange erschienen. Es ist mir nicht bekannt, ob die Textdateien noch existieren, aber selbst wenn, das Endprodukt war eben keine digitale oder Hybrid-Edition, sondern eine Druckausgabe. Vier Gedanken:

  • Es zählte bei den Textdateien nur das “Aussehen” der Druckfassung und keine standardisierte Auszeichnung, wie es die TEI möglich macht, damit diese Texte in anderen Zusammenhängen, in Portalen oder mit Hilfe von Tools nachnutzbar gemacht werden können, ganz abgesehen vom Online-Zugriff.
  • Auf der anderen Seite hatte der Editor ein leichtes Spiel und konnte fachwissenschaftliche und technische Workflows gut in seiner Person vereinen. Wenn man ansatzweise die “Tücken” seines Textverarbeitungsprogramms kannte, war die Lernkurve relativ gering.
  • Heute gibt es eine viel stärkere Ausdifferenzierung der Rollen, nicht zuletzt einer der Gründe, warum digitale Editionen viel mehr in Teams als durch Einzelpersonen erstellt werden.
  • Die Versionierung war ein Graus, vor allem wenn zwischendurch Andere Kontroll- oder Teilaufgaben übernahmen, da alles in einem Dokument und (lange Zeit vor Cloud-basierten Kollaborationswerkzeugen) offline passierte.

Es ist inzwischen fast Gemeingut, dass der verstärkte wissenschaftliche Einsatz digitaler Tools, Methoden und Standards wie der TEI etc. in den Geisteswissenschaften nach Zusatzqualifikationen und methodologische Reflektionen verlangen. Natürlich darf dabei das Fachwissen nicht außer Acht gelassen werden. Wenn niemand mehr historische Handschriften lesen kann und das editionswissenschaftliche und fachwissenschaftliche Know-How und Methodenverständnis fehlen, helfen auch die TEI-Richtlinien und XML-Editoren nicht weiter… Deshalb sind maßgeschneiderte Schulungs- und Weiterbildungsangebote, ob im Rahmen universitärer Curricula oder als Workshops, Summer und Winter Schools sowie Online-Angebote für StudentInnen, Wissenschaftliche MitarbeiterInnen bis zu ProfessorInnen etc. ungemein wichtig. Nicht nur, um die Praxis zu lehren, sondern auch um über die Vor- und Nachteile, bzw. Verbesserungspotentiale zu reflektieren. Jedes Produkt lebt letztendlich vom Nutzerfeedback und sein “Marktwert” steigt mit dem Bekanntheits- und Einsatzgrad. Auf Niederländisch gibt hierzu das sehr treffende Sprichwort “Onbekend maakt onbemind” (Unbekannt macht Ungeliebt)… Ich habe beispielsweise noch während meines Masters Editionswissenschaft im Bereich Digitale Editionen “nur” TUSTEP und InDesign gelehrt bekommen, die Mächtigkeit der TEI ist mir erst später bewusst geworden, als ich mich verstärkt für digitale Editionswissenschaft zu interessieren began. Thanks to DH Oxford!

Die European Summer University (ESU) in Digital Humanities unter der Leitung von Prof. Elisabeth Burr ist ein sehr gelungenes Beispiel für ein Format, dass zum einen die Potentiale digitaler Forschung aufzeigt und zum anderem Hands On die benötigten Kompetenzen vermittelt und kritisch die Methoden befragt. Besonders spannend an der ESU ist die breite “Streuung” des Publikums, sowohl geographisch als auch soziologisch (in dem Sinn, dass tatsächlich unter den TeilnehmerInnen eine Spannbreite von StudentenInnen bis ProfessorInnen zu finden ist). Ein passender Ort somit auch die durch das H2020-Projekt PARTHENOS entwickelten Trainings- und Schulungsmaterialien und -formate, die PARTHENOS-Training Suite, im Rahmen einer Projektpräsentation vorzustellen. Noch einmal herzlichen Dank für die Einladung und die perfekte Organisation!

IMG_20170719_124422

Auslage an der ESU Registrierung

Die ESU und der Lehrstuhl von Prof. Burr sind jedoch nicht der einzige DH Hot Spot in Leipzig. Besonders zu nennen ist natürlich der Humboldt Chair of Digital Humanities mit dem Lehrstuhlinhaber Prof. Gegory Crane und seinem Team, aber auch die DH Aktivitäten der Universitätsbibliothek Leipzig. Zu letzterem könnte man fast sagen, dass DH + Bibliotheken ein “Match in Heaven” sind. Bibliotheken haben meist genau die Bestände, mit denen man DH “machen” kann. Aber wie kommt die DH Community an diese Daten? Sie kann ja nicht alles selbst digitalisieren, das wäre nicht nur uneffektiv, sondern ist auch nicht immer möglich. Ein Aufgabe, der sich Bibliotheken daher verstärkt annehmen, ist die Digitalisierung und die Bereitstellung und Archivierung der Daten aus Digitalisierungsprojekten. Besonders begrüßenswert für die Forschung ist es dann, wenn dies im Rahmen einer Open Digitization Stategy, wie an der Universitätsbibliothek Leipzig geschieht, und die Daten zum Beispiel über Digitale Sammlungen präsentiert und in an andere Recherche- und Verarbeitungssysteme weitergeben werden.

IMG_20170719_123516

Universitätsbibliothek Leipzig (Albertina)

Last but not least ist in Leipzig auch einer der beiden Sitze der Deutschen Nationalbibliothek, deren Stategische Prioritäten 2017-2020 stark durch digitale Innovationen gepägt sind.

Wer jetzt Lust bekommen hat nach Leipzig zu fahren. Neben DH sind nicht zuletzt der eindrucksvolle Kopfbahnhof und die atmosphärische Innenstadt eine Reise wert. Wenn die Reise noch etwas Aufschub erfordert, sei ein virtueller Besuch der DNB, genauer gesagt der Austellung des Deutschen Buch- und Schriftmuseums Zeichen – Bücher – Netze (2014) empfohlen. Leipzig lohnt sich!

IMG_20170719_215416

Leipzig Hauptbahnhof