© SMITH Office

Unlocking the Value of Clinical Texts for Research and AI | 9th GeMTeX Plenary Meeting May 13th, 2025 in Leipzig

About 25 people working on the GeMTeX project met last week at the Albertina Library of Leipzig University to discuss the current status of the annotation work and the future use of the GeMTeX text corpus.

The goal of GeMTeX is to process medical documents such as medical reports and discharge summaries in a way that allows them to be used for research and artificial intelligence applications in compliance with data protection regulations. A key step in this process is the annotation of clinical documents. For this purpose, medical students at six clinical sites mark content from these documents and enrich them with metadata.

Annotation guidelines evolve with the project

Last year, the focus was initially on de-identification: information that could reveal a person’s or institution’s identity—such as names, locations, or birthdates—was anonymized and automatically replaced with pseudonyms. Now, many partner sites are beginning the next phase: semantic annotation. In this step, medical content such as diagnoses or procedures is categorized. To ensure these complex annotations remain comparable across sites, the GeMTeX team has developed detailed guidelines for semantic annotation, which continue to evolve as the project progresses.

At the plenary meeting, Justin Hofenbitzer, a research associate at the Technical University of Munich, presented the current status of semantic annotation at the Munich site and shared practical insights and challenges. The annotation guidelines have been refined based on the Munich annotation team’s experiences—for example, in distinguishing between recommendations and indications in medical reports

Annotation experiences are being evaluated

Another key focus of the meeting was the evaluation of the annotation work. Andrea Riedel and Jakob Faller from University Hospital Erlangen, Christina Lohr from Leipzig University, and Justin Hofenbitzer presented a survey conducted among the annotators. The survey aims to provide insights into how much time annotation takes, how helpful pre-annotations are, and what professional value the medical students perceive in working with clinical texts they wouldn’t normally encounter during their studies.
Following this, participants at the plenary meeting discussed the technical status of the annotation platforms and software solutions used for annotation.

With the project scheduled to end in August next year, the GeMTeX team has already begun planning evaluation projects and the reuse of the text corpus.

The next GeMTeX plenary meeting will take place online on September 17, 2025.

By the way: Visitors of this year’s Medical Informatics Europe Conference 2025 in Glasgow can learn how to replicate the GeMTeX text corpus. On May 21, GeMTeX network coordinator Prof. Dr. Martin Boeker and research associates Dr. Frank Meineke, Andrea Riedel, Justin Hofenbitzer, and Christina Lohr will hold a workshop where participants can try out de-identifying and semantically annotating documents themselves. Further information about EFMI MIE 2025 can be found here.