© SMITH-Geschäftsstelle

“The texts from clinical care should be available to the entire research community” | 3rd GeMTeX plenary meeting on 20. – 21.11.2023 in Munich

Work on the GeMTeX method platform, a project of the Medical Informatics Initiative (MII), has already been underway for six months. On November 20 and 21, 2023, around 30 GeMTeX employees met at the Faculty of Medicine at the Technical University of Munich to discuss the initial progress and milestones of the project.

GeMTeX focuses on the development of a data collection that uses texts from clinical patient care. Annotation work at the university hospitals Charité Berlin, Dresden, Erlangen, Essen, Leipzig and TU Munich is at the heart of the GeMTeX project. Clinical texts are provided with metadata on content and structure so that they can be used, for example, for training language models as an application of natural language processing.

In his welcoming address, project leader Prof. Dr. Martin Boeker, Professor of Medical Informatics at the Technical University of Munich, emphasized the sustainable approach that is being pursued with the development of the text corpus as part of the GeMTeX project: “We are not just doing this for the Medical Informatics Initiative – we want to make the medical texts available to the entire community”. Once the work has been completed, the text corpus will be made available via the ZB MED German National Library of Medicine.

Comprehensive guidelines for annotations are being developed

The first day focused on the annotation work planned starting March 1, 2024. The guidelines for the annotations are currently being developed by the Annotation Working Group, headed by Luise Modersohn (TU Munich). These summarize the specifications according to which texts are enriched with additional information.

The GeMTeX project is inspired by the international EU-funded AIDAVA project, which deals with the automatic collection and provision of patient data: In his presentation, Prof. Dr. Stefan Schulz from the Medical University of Graz presented the development of an annotation guideline for the EU project AIDAVA, which is largely based on SNOMED-CT. He also showed how the internationally widespread terminology system SNOMED CT can be used for mapping text fragments.

As part of the GeMTeX project, a general annotation guideline is being developed that is also largely based on SNOMED CT and does not need to be adapted for a specific use case or data source. Christina Lohr from the Leipzig University presented the current state of development of these guidelines. In addition to the general annotation guideline, application-specific annotation guidelines will be created that can be used for specific medical issues. These include guidelines for the annotation of texts from cardiology, oncology and neurology as well as texts in which potential adverse drug interactions could be described.

Technical implementation of the GeMTeX project is in preparation

The second day of the GeMTeX plenary meeting was dedicated to technical topics. Dr. Frank Meineke, technical project management in the GeMTeX project, presented the results of a survey on the technical status of the sites. An important component of this is the current number of informed patient consents (Broad Consent), which form the basis for the development of the text corpus for GeMTeX. The aim is to have at least 60,000 documents from the six Data Integration Centers involved in the project available for the text corpus.

The last part of the event was all about the software required to implement the GeMTeX project. The annotation platform INCEpTION at TU Darmstadt will be used to annotate the texts. Serwar Basch, research assistant at TU Darmstadt, reported on the latest developments around an INCEpTION dashboard for a detailed evaluation of annotation work. The Berlin-based company ID GmbH & Co. KGaA is providing the terminology server for the project. André Sander, Head of Technical Development at ID, presented the current status of the terminology server at the plenary meeting.

Finally, Franz Matthies (University of Leipzig) presented a reference platform that includes the interaction of software from industry partner Averbis GmbH for automated pre-annotation and the INCEpTION annotation platform. The platform is to be rolled out at the participating locations.

The next plenary meeting will take place online on February 27, 2024.