Ready for the annotation work | 5th GeMTeX Plenary Meeting on May 15 and 16 in Leipzig

The GeMTeX project team started its work almost one year ago. Last week, more than 40 members met at the Albertina Library of the Leipzig University to discuss the milestones achieved so far and to plan the upcoming annotation work, which is scheduled to start on June 1, 2024.

GeMTeX focuses on the development of a data collection with texts from clinical patient care. An essential core of the GeMTeX project is the annotation work at the university hospitals Charité Berlin, Dresden, Erlangen, Essen, Leipzig and TU Munich. Clinical texts are annotated with content and structural metadata in order to be used for the training of language models.

Annotation platform updated with additional features

The GeMTeX text corpus is based on data released for research by patients through the Medical Informatics Initiative’s (MII) Broad Consent. “We have reached an important milestone: the data protection concept and the study protocol for the annotation work in GeMTeX have been approved by the ethics committee of the Technical University of Munich,” said Professor Martin Boeker of the TU Munich and GeMTeX project leader at the plenary meeting. In order to be able to use the texts in GeMTeX, a privacy and ethics vote must first be obtained at each site. The other sites are in the process of submitting their applications to the respective ethics committees.

Then, moderated by Frank Meineke from the Leipzig University, the current state of the technical infrastructure was discussed. The team from TU Darmstadt has extensively updated the annotation platform INCEpTION so that, for example, matches in annotation groups can now be better visualized. The so-called curation of annotations is also a significant update of INCEpTION. This makes it possible to visualize different decisions of different annotators and to decide for one or the other annotation.

First sites work their assistants into the annotations

Luise Modersohn, scientific project manager for GeMTeX and head of the junior research group DE.xt at the Technical University of Munich, moderated the session on annotation. The focus was on the first versions of guidelines for general semantic annotation. The four domain-specific annotations in cardiology, neurology, pharmacology and oncology were presented by the respective working group leaders Philipp Wiesenbach (Heidelberg University Hospital), Aliaksandra Shutsko (ZB Med), Annette Härdtlein (LMU University Hospital Munich) and Florian Borchert (Hasso Plattner Institute).

In addition, training materials for annotators and the coordination of annotation groups are now available. This collection contains documents with work instructions and short videos, e.g. on the introduction to the INCEpTION tool or on annotation rules.

Student assistants at the Leipzig, Erlangen and Essen sites have already started to mark the text passages to be de-identified and to familiarize themselves with INCEpTION. They are currently using freely available synthetic texts, which are not subject to data protection and are very important for testing the processes. Semantic annotation of diagnoses and drugs will start in June.

The next GeMTeX plenary meeting will be held online on September 26.