One Day Introduction to Topic Modeling with LDA and Distributional Semantics, 30 May 2013

This day was an initiative of the Vlaamse Werkgroep Medievistik and held at the University of Antwerp. Our teacher was Martin Riedl from the Technische Universität Darmstadt.

During the day Martin initiated us to topic modeling, a research methodology of the Digital Humanities that in theory allows us to analyze in a relative simple way huge textual corpora according to their main topics. The underlying algorithms look very complicated, but using special software lets you ignore them and in a relative simply way shows you the most common clusters of words that – in ideal case – represent a topic in the corpus. Thus in an ideal world, topic modeling will show you the main topics in a corpus and their distribution and also the distribution of different words in a topic. It can then attribute these topics also to new documents and may help structure a corpus.

However, you cannot just take let’s say a digitized Middle Dutch text and run the software, because first of all your results will be spoiled by structural words and very common words (think of articles, auxiliary verbs…) and also the un-standardized medieval orthography causes a problem, but there are solution for this, think of removing the very common “non-content” words and automatically standardize  the orthography. Currently Mike Kerstemont is working on applying this technique to medieval texts and some first results look quite promising. To learn more about topic modeling in the Digital Humanities, its methods, advantages, problems, and theoretical reflection I think this issue of the Journal of Digital Humanities is very interesting. As for most medievalists the underlying technique may be too complicated to grasp, the good thing is, that many computer scientists who master these techniques seem more than willing to apply them to interesting corpora, so one should not be afraid to seek co-operations!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


About my research and some other interesting stuff

LIBREAS. Library Ideas

About my research and some other interesting stuff

Gemeinschaftsblog zu wissenschaftlicher Kommunikation im Netz.

ViFa Benelux-Blog

Das ViFa Benelux-Blog bietet aktuelle Informationen für die Fachgebiete Niederlandistik, Niederlande-, Belgien- und Luxemburgforschung.


James Palmer on the Early Middle Ages and Other Things

Book History and Print Culture Network

Interdisciplinary Perspectives from German-Area Scholars (D-A-CH)


research education, academic writing, public engagement, funding, other eccentricities.

The Schoenberg Institute for Manuscript Studies at Penn brings manuscript culture, modern technology and people together.

Along these lines ... by Debs Thorpe


Thoughts on Medieval Architecture, by Karl Kinsella

Available Online

Digitisation and Research Data in the UK, Europe and beyond (and other stuff too)

Medieval manuscripts blog

About my research and some other interesting stuff

The Research Whisperer

Just like the Thesis Whisperer - but with more money

The Thesis Whisperer

Just like the horse whisperer - but with more pages

%d bloggers like this: