From Modelling to Transcription: Workshop Notes from DHd2026
During DHd2026 in Vienna, many discussions revolved around how digital tools shape the way we work with texts and data. Instead of trying to summarise the entire conference, this blog post focuses on the workshops I attended during the first days and on a few ideas that stayed with me throughout the week.
Looking back at my notes, I realised they already suggested a structure for this post. The workshops I attended raised questions about modelling, transcription, and data that later reappeared in other panels and keynotes during the conference.
Note 1: Starting with Practice
My first two days at the conference were shaped by workshops, and that felt like a good way to begin. Rather than starting with big claims about digital humanities, I started by sitting down with tools, notebooks, scripts, and a lot of practical questions.
On the first day, I attended the workshop “Beyond Entities: Inhaltsbasierte Erschließung digitaler Editionen mit KI.” We worked with Python, APIs, and Jupyter notebooks to extract RDF triples from TEI-encoded early modern letters. What I found especially interesting was that the workflow did not stop at named entities. Instead, it tried to model conceptual relations in the texts. Themes such as emotion, illness, or social order became part of a semantic structure that could then be visualised and analysed further.

I liked that this workshop stayed close to the material while still asking what kinds of structures can be made visible through computational methods. It also made very clear that modelling is never neutral. Even at the level of prompts and extraction rules, decisions shape what the final data looks like.
Note 2: From Audio to Text
The second workshop I attended, “Vom Audio zum Text: Automatisierte Transkriptionen mit Whisper,” shifted the focus from written material to spoken language. We looked at automated transcription workflows, compared tools, and worked through Python-based pipelines for transcription and speaker diarisation.

What stayed with me most was the discussion after the practical part. Our reflections quickly clustered around three terms: transfer, opportunities, and limits.
The question of transfer came up in relation to both teaching and research. We talked about how automated transcription might be built into thesis work, methods courses, or training materials. There were ideas about shared standards, open educational resources, and also about making these workflows usable for people who are not deeply technical.
The opportunities were easy to see. Automated transcription can save time, give a quick overview of larger amounts of audio material, and make certain kinds of corpus building much more realistic. Reusable code and adaptable workflows also make it easier to test different research setups.
At the same time, the workshop discussion was just as much about the limits. In qualitative work especially, transcripts are never just raw text. Things like pauses, laughter, overlap, hesitation, and speaker dynamics matter, and automated systems do not capture all of this equally well. We also kept coming back to transparency, validation, and the need to check what a model is actually doing.
For me, this workshop was useful precisely because it did not present automation as a magic solution. It showed where these tools can help, but also where they flatten the material.
Note 3: The Keynote as a Frame
Only after these two workshops came the opening keynote by Miriah Meyer, “Data As ___________: Exploring the Plurality of Data in Visualization.” By that point, I already had the workshop experiences in mind, and that made the keynote even more interesting to listen to.
What I took from it was not one single definition of data, but the opposite. Data appeared here as something plural, shaped, and dependent on context. Meyer spoke about data as entangled, as design material, and as connection. That fit surprisingly well with what I had just seen in the workshops. Whether we are extracting semantic triples from letters or producing transcripts from audio, data do not simply appear ready-made. They are produced through tools, settings, modelling choices, and research interests.
The question “Is data graffiti data?” while looking at sticker emojis museum visitors made on a data sheet, stayed with me because it made this point in a way that was funny and sharp at the same time. It pushed against the idea of data as something clean and self-evident.
Looking Back at the Rest of the Week
For me, the workshops were an amazing start to DHd2026 because they made it possible to move back and forth between trying things out and thinking about what those methods actually imply for research practice.
As the conference continued, I noticed that many of the themes from the workshops reappeared in other sessions. Panels such as “Not Just Text, Intertext!” and projects like NAKAR returned to questions of modelling, connection, and interpretation. The final keynote by Katharina Kinder-Kurlanda then made the political and epistemic side of data work even more explicit.
Looking back at my notes now, this connection between experimentation and reflection is probably my main takeaway from the week. The interesting part is not only that digital tools can do more and more. They also force us to ask more precise questions about modelling, interpretation, transparency, and what we even mean when we call something data. In that sense, the workshops were not just an introduction to tools, but also to the questions that come with using them.
This blog post was written as part of a travel grant for the DHd 2026 conference. My sincere thanks go to NFDI4Memory for supporting my participation, and to the conference organizers for their excellent work in making the event such a positive experience.
Kommentar schreiben