{"id":26281,"date":"2026-07-01T12:24:42","date_gmt":"2026-07-01T10:24:42","guid":{"rendered":"https:\/\/www.smith.care\/?p=26281"},"modified":"2026-07-01T12:58:01","modified_gmt":"2026-07-01T10:58:01","slug":"gemtex-llm-workshop-plenary-meeting-06-26","status":"publish","type":"post","link":"https:\/\/www.smith.care\/en\/2026\/07\/01\/gemtex-llm-workshop-plenary-meeting-06-26\/","title":{"rendered":"Advancing the Use of Clinical Texts for Research and Artificial Intelligence"},"content":{"rendered":"\n<p><strong>GeMTeX LLM Workshop and Plenary Meeting, June 22\u201323, 2026<\/strong><\/p>\n\n\n\n<p>Unstructured clinical documents such as physicians&#8217; notes and discharge summaries contain a wealth of valuable data for medical research. Making these data securely and in compliance with data protection regulations available for research and artificial intelligence (AI) applications is one of the main objectives of the <a href=\"https:\/\/www.smith.care\/en\/gemtex_mii\/about-gemtex\/\">GeMTeX<\/a> project within the German <a href=\"https:\/\/www.medizininformatik-initiative.de\/en\/start\" target=\"_blank\" rel=\"noreferrer noopener\">Medical Informatics Initiative (MII)<\/a>.<\/p>\n\n\n\n<p>On June 22 and 23, 2026, GeMTeX project members met at TUM University Hospital Rechts der Isar for an internal Large Language Model (LLM) Workshop and the project&#8217;s plenary meeting.<\/p>\n\n\n\n<p><strong>Preparing for the Final Phase of the GeMTeX Project<\/strong><\/p>\n\n\n\n<p>The plenary meeting focused on the project&#8217;s current progress and the preparation of its final phase. Justin Hofenbitzer (TUM University Hospital Rechts der Isar) presented a major milestone: more than 1,000 clinical documents from six university hospitals have now been semantically annotated.<\/p>\n\n\n\n<p>As part of the annotation process, medical students identify relevant information in clinical documents and enrich it with metadata, making the texts machine-readable. These high-quality datasets provide the foundation for research and the development of new methods in clinical natural language processing.<\/p>\n\n\n\n<p><strong>Publication of the GeMTeX Text Corpus Moves Closer<\/strong><\/p>\n\n\n\n<p>The project has also made significant technical progress. Jakob Faller (University Hospital Erlangen) presented the new <a href=\"https:\/\/www.smith.care\/en\/2026\/05\/13\/core-data-set-module-document\/\">Core Dataset (CDS) Module &#8220;Document&#8221;<\/a>, which was developed in collaboration with the <a href=\"https:\/\/www.medizininformatik-initiative.de\/en\/node\/1164\" target=\"_blank\" rel=\"noreferrer noopener\">Digital Hub MiHUB<\/a>.<\/p>\n\n\n\n<p>The CDS module enables clinical text documents processed within GeMTeX to be integrated from the Data Integration Centers into the German Portal for Medical Research Data (FDPG). Both de-identified and semantically annotated documents can therefore be provided in a standardized format for research and clinical use. Initial results of these developments have already been presented at the international <a href=\"https:\/\/lrec2026.info\/\" target=\"_blank\" rel=\"noreferrer noopener\">Language Resources and Evaluation Conference (LREC)<\/a> and <a href=\"https:\/\/mie2026.efmi.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Medical Informatics Europe (MIE)<\/a> conference.<\/p>\n\n\n\n<p>In addition, the GeMTeX team is preparing a first prototype use case in collaboration with the German National Library of Medicine (ZB MED). Ethical approval has already been obtained, and all six participating university hospitals have granted approval through their respective Use and Access Committees. As a result, the text corpus created within the project will soon be available for scientific research projects upon request and under defined conditions.<\/p>\n\n\n\n<p><strong>LLM Workshop Highlighted the Potential of AI Language Models for Medicine<\/strong><\/p>\n\n\n\n<p>On the preceding day, project members discussed current research and practical applications of Large Language Models (LLMs) for processing clinical documents during an internal GeMTeX LLM Workshop. LLMs are AI-based language models that learn from large volumes of text and can, for example, generate coherent text independently.<\/p>\n\n\n\n<p>The workshop covered a range of topics, including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Methods for the de-identification of sensitive information<\/li>\n\n\n\n<li>Automated recognition of clinical entities<\/li>\n\n\n\n<li>Comparisons between synthetically generated and authentic medical history dialogues<\/li>\n\n\n\n<li>Structured processing of clinical practice guidelines<\/li>\n\n\n\n<li>The use of LLM-supported software in research<\/li>\n<\/ul>\n\n\n\n<p><br>The workshop demonstrated that language models have considerable potential to support many tasks in clinical text processing. At the same time, it highlighted the importance of rigorous scientific evaluation and robust data protection frameworks\u2014both of which are key objectives of the GeMTeX project.<\/p>\n\n\n\n<p>The final GeMTeX project meeting will take place on October 20, 2026, at the Albertina Library of Leipzig University.<\/p>\n\n\n\n<p><strong>Recently published:<\/strong><\/p>\n\n\n\n<p>Jakob Faller, Marcel Susky, Noemi Deppenwiese, Justin Hofenbitzer, Christina Lohr, Thomas Ganslandt, Martin Boeker, Frank Meineke. <a href=\"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI260389\" target=\"_blank\" rel=\"noreferrer noopener\">Standardized Information Model for Clinical Texts: The MII Core Data Set Module Document<\/a>. <em>Stud Health Technol Inform<\/em>. 2026 May 21;336:1202-1206. DOI: 10.3233\/SHTI260389.<\/p>\n\n\n\n<p>Christina Lohr, Marvin Seiferling, Philipp Wiesenbach, Jakob Faller, Christoph Dieterich. <a href=\"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI260440\" target=\"_blank\" rel=\"noreferrer noopener\">The SURROGATOR Framework for Context-Aware Surrogation of Privacy Sensitive Information in Medical Text<\/a>. <em>Stud Health Technol Inform<\/em>. 2026 May 21;336:1405-1409. DOI: 10.3233\/SHTI260440. [<a href=\"https:\/\/chlor.github.io\/materials\/202605_mie_surrogator\/20260526_GeMTeX_Surrogator.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Slides<\/a>] [<a href=\"https:\/\/github.com\/medizininformatik-initiative\/GeMTeX\/tree\/main\/surrogator\" target=\"_blank\" rel=\"noreferrer noopener\">Code SURROGATOR<\/a>] [<a href=\"https:\/\/github.com\/dieterich-lab\/SurrogatorEval\" target=\"_blank\" rel=\"noreferrer noopener\">Code Evaluation<\/a>]<\/p>\n\n\n\n<p>Justin Hofenbitzer, Christina Lohr, Andrea Riedel, Rebekka Kiser, Aliaksandra Shutsko, Abanoub Abdelmalak, Peter Kl\u00fcgl, Jutta Romberg, Sarah Riepenhausen, Miriam Schechner, Jakob Faller, Frank Meineke, Luise Modersohn, Markus L\u00f6ffler, Juliane Fluck, Udo Hahn, Stefan Schulz, Martin Boeker. <a href=\"https:\/\/lrec.elra.info\/lrec2026-main-122\" target=\"_blank\" rel=\"noreferrer noopener\">Developing the German Medical Text Corpus (GeMTeX): Legal Compliance and Semantic Enrichment<\/a>. In <em>Proceedings of the Fifteenth Language Resources and Evaluation Conference <\/em>(LREC 2026) (pp. 1571\u20131584). <em>European Language Resources Association (ELRA).<\/em> DOI: 10.63317\/4eqiegnqbu96.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From June 22 to 23, 2026, GeMTeX project staff gathered at the TUM University Hospitel rechts der Isar for an internal large language model (LLM) workshop and plenary meeting. The meeting was held to discuss the project\u2019s current progress and prepare for the final phase of work.<\/p>\n","protected":false},"author":14,"featured_media":26288,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-26281","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/26281","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/comments?post=26281"}],"version-history":[{"count":3,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/26281\/revisions"}],"predecessor-version":[{"id":26285,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/26281\/revisions\/26285"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/media\/26288"}],"wp:attachment":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/media?parent=26281"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/categories?post=26281"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/tags?post=26281"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}