{"id":23866,"date":"2024-10-04T11:56:42","date_gmt":"2024-10-04T09:56:42","guid":{"rendered":"https:\/\/www.smith.care\/?p=23866"},"modified":"2024-10-04T12:12:18","modified_gmt":"2024-10-04T10:12:18","slug":"gemtex-de-identification","status":"publish","type":"post","link":"https:\/\/www.smith.care\/en\/2024\/10\/04\/gemtex-de-identification\/","title":{"rendered":"GeMTeX creates first standard for de-identification of German medical documents"},"content":{"rendered":"\n<p>In the <a href=\"https:\/\/www.smith.care\/en\/gemtex_mii\/about-gemtex\/\">GeMTeX<\/a> project of the <a href=\"https:\/\/www.medizininformatik-initiative.de\/en\/start\" target=\"_blank\" rel=\"noreferrer noopener\">Medical Informatics Initiative (MII)<\/a>, an interdisciplinary team is working to make texts from clinical routine care available for research and clinical use. The goal is to create one of the largest datasets for automatic processing of German-language medical texts. The GeMTeX team has now reached an important milestone: for the first time, researchers from the University Hospitals of Leipzig and Erlangen have published annotations for a text corpus, which serve as a template for the de-identification of German medical texts. Annotations are markers for text passages that provide metadata about the content. These markers make the texts usable for applications such as artificial intelligence and large language models.<\/p>\n\n\n\n<p><strong>Pilot study on the annotation of personal health information<\/strong><\/p>\n\n\n\n<p>In the process of de-identification, data that could allow conclusions to be drawn about individuals is rendered unrecognizable. In a pilot study, medical students from the Universities of Leipzig and Erlangen, together with a team of experts from the fields of linguistics, medicine and computer science, annotated 1,438 fictitious doctor&#8217;s letters. The letters were taken from the Graz Synthetic Text Corpus (GRASCCO).<\/p>\n\n\n\n<p>During the annotation process, the GeMTeX team focused on text passages containing sensitive information such as names, dates, addresses, or professions. These annotations make it possible to adapt software to automatically detect and encrypt personal health information in clinical documents.<\/p>\n\n\n\n<p><strong>An example of privacy-compliant processing of medical texts<\/strong><\/p>\n\n\n\n<p>The annotated documents have been published on the international research data platform <a href=\"https:\/\/zenodo.org\/records\/11502329\">Zenodo<\/a> and are intended to serve as a template for future projects. Together with the annotated corpus, a publication was published that describes a procedure for de-identification of medical documents. The de-identification pipeline consists of the following steps:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Exporting the clinical texts as raw data from the local hospital information system<\/li>\n\n\n\n<li>Import into the INCEpTION annotation platform<\/li>\n\n\n\n<li>Automated pre-annotation of relevant text passages with identifying information by the Averbis Health Discovery Pipeline<\/li>\n\n\n\n<li>Manual review and correction of annotations using the dual control principle<\/li>\n\n\n\n<li>Automated replacement of pre-annotated and corrected data with appropriate pseudonyms (see figure)<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignleft size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"506\" src=\"https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-1024x506.png\" alt=\"Pipeline for De-Identification in GeMTeX.\" class=\"wp-image-23879\" style=\"width:1130px;height:auto\" srcset=\"https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-1024x506.png 1024w, https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-300x148.png 300w, https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-768x380.png 768w, https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-1536x760.png 1536w, https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-2048x1013.png 2048w, https:\/\/www.smith.care\/wp-content\/uploads\/2024\/10\/Grafik_Gemtex-de-identification_EN-700x346.png 700w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Process of de-identification in the GeMTeX project. \u00a9 SMITH Office\/GeMTeX<\/figcaption><\/figure><\/div>\n\n\n<p>The results of the pilot study have been summarized in a publication.<\/p>\n\n\n\n<p><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/39234720\/\" target=\"_blank\" rel=\"noreferrer noopener\">Link to the publication in PubMed<\/a><br><a href=\"https:\/\/zenodo.org\/records\/11502329\" target=\"_blank\" rel=\"noreferrer noopener\">Link to the annotated text corpus and annotation guidelines (Zenodo)<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The GeMTeX team has reached a significant milestone: Researchers from the University Hospitals of Leipzig and Erlangen have, for the first time, published annotations for a text corpus that serve as a template for the de-identification of German-language medical texts. This lays the groundwork for the use of medical text, for example, in training AI applications.<\/p>\n","protected":false},"author":19,"featured_media":23859,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-23866","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/23866","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/comments?post=23866"}],"version-history":[{"count":6,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/23866\/revisions"}],"predecessor-version":[{"id":23888,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/posts\/23866\/revisions\/23888"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/media\/23859"}],"wp:attachment":[{"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/media?parent=23866"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/categories?post=23866"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.smith.care\/en\/wp-json\/wp\/v2\/tags?post=23866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}