{"id":26205,"date":"2026-05-13T08:00:00","date_gmt":"2026-05-13T06:00:00","guid":{"rendered":"https:\/\/www.smith.care\/?p=26205"},"modified":"2026-05-13T11:24:55","modified_gmt":"2026-05-13T09:24:55","slug":"core-data-set-module-document","status":"publish","type":"post","link":"https:\/\/www.smith.care\/en\/2026\/05\/13\/core-data-set-module-document\/","title":{"rendered":"Standardizing Clinical Documents: The New Core Dataset Module \u201cDocument\u201d"},"content":{"rendered":"\n<p>The computer-assisted analysis of clinical texts such as discharge letters or surgical reports is becoming increasingly important for medical research due to advances in natural language processing (NLP) and large language models (LLMs). To make these unstructured data usable across institutions, the \u201cDocument\u201d core <a href=\"https:\/\/www.medizininformatik-initiative.de\/en\/medical-informatics-initiatives-core-data-set\" target=\"_blank\" rel=\"noreferrer noopener\">data set (CDS) module<\/a> was developed within the framework of the <a href=\"https:\/\/www.medizininformatik-initiative.de\/en\/start\" target=\"_blank\" rel=\"noreferrer noopener\">Medical Informatics Initiative (MII)<\/a>. The module is designed to formally represent the link between the actual text content and its descriptive metadata.<\/p>\n\n\n\n<p><strong>Technical Foundations and Compatibility<\/strong><\/p>\n\n\n\n<p>Technically, the module is based on the international standard HL7 FHIR (<a href=\"https:\/\/www.hl7.org\/fhir\/\" target=\"_blank\" rel=\"noreferrer noopener\">Fast Healthcare Interoperability Resources<\/a>), specifically the DocumentReference resource. During modeling, a high degree of compatibility with already established German standards was pursued, particularly with the models of the National Association of Statutory Health Insurance Physicians (KBV) and the <a href=\"https:\/\/fachportal.gematik.de\/zielgruppen\/primaersystemhersteller\/isik\" target=\"_blank\" rel=\"noreferrer noopener\">ISiK specifications<\/a> of Gematik. While these primary systems mainly support healthcare provision and data exchange with hospital information systems, the CDS module \u201cDocument\u201d is specifically focused on secondary use in research.<\/p>\n\n\n\n<p><strong>Structure and Special Features of the Module<\/strong><\/p>\n\n\n\n<p>For the standardized classification of documents, the use of the Clinical Document Classes List (KDL) is recommended. Other CDS modules, such as \u201cPerson\u201d and \u201cCase,\u201d ensure the medical context of the document. One notable feature is the integrated NLP status extension, which accurately tracks a document\u2019s processing status\u2014such as whether it has already been de-identified or annotated.<\/p>\n\n\n\n<p><strong>Development and Governance<\/strong><\/p>\n\n\n\n<p>The development process was driven by an interdisciplinary team of experts and coordinated by the Core Dataset Taskforce and the MII Interoperability Working Group. Tools such as FHIR Shorthand (FSH) were used for the technical implementation in order to formally define the profiles and publish them for users in an Implementation Guide on the \u201c<a href=\"https:\/\/simplifier.net\/\" target=\"_blank\" rel=\"noreferrer noopener\">Simplifier<\/a>\u201d platform.<\/p>\n\n\n\n<p><strong>Cross-Project Collaboration<\/strong><\/p>\n\n\n\n<p>The module was significantly advanced through requirements arising from the <a href=\"https:\/\/www.smith.care\/en\/gemtex_mii\/about-gemtex\/\" target=\"_blank\" rel=\"noreferrer noopener\">GeMTeX (German Medical Text Corpus)<\/a> project. The project aims to establish a nationwide corpus of clinical routine texts in Germany. Stakeholders from different contexts collaborated on this effort, including support from the <a href=\"https:\/\/mihubx.de\/\" target=\"_blank\" rel=\"noreferrer noopener\">MiHUBx<\/a> (Medical Informatics Hub in Saxony \u2013 since January 2026 \u201cMiHUB\u201d) project to strengthen interoperability between sites. Synergies with projects from the <a href=\"https:\/\/www.netzwerk-universitaetsmedizin.de\/en\" target=\"_blank\" rel=\"noreferrer noopener\">Network of University Medicine (NUM)<\/a> are also contributing to the long-term harmonization of data structures.<\/p>\n\n\n\n<p><strong>Importance for the Research Infrastructure<\/strong><\/p>\n\n\n\n<p>The module is of central importance for the Data Integration Centers and for research, as it forms the foundation for providing text data via the <a href=\"https:\/\/forschen-fuer-gesundheit.de\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">German Portal for Medical Research Data (FDPG)<\/a>. Researchers can therefore perform targeted queries for patient cohorts for which specific document types are available in a defined processing state and apply for data export.<\/p>\n\n\n\n<p>On May 28, Jakob Faller from University Hospital Erlangen will present the development of the module at the <a href=\"https:\/\/mie2026.efmi.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Medical Informatics Europe Conference<\/a> in Genoa. His contribution is part of the session \u201cInfrastructures and Regulations\u201d from 8:30 a.m. to 10:00 a.m.<\/p>\n\n\n\n<p><em>Text: Dr. Frank Meineke | Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, Leipzig University<\/em><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The computer-assisted analysis of clinical texts such as discharge letters or surgical reports is becoming increasingly important for medical research due to advances in natural language processing (NLP) and large language models (LLMs). To make these unstructured data usable across institutions, the \u201cDocument\u201d core data&#8230;<\/p>\n","protected":false},"author":14,"featured_media":26203,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-26205","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/26205","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/comments?post=26205"}],"version-history":[{"count":2,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/26205\/revisions"}],"predecessor-version":[{"id":26207,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/26205\/revisions\/26207"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/media\/26203"}],"wp:attachment":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/media?parent=26205"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/categories?post=26205"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/tags?post=26205"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}