Through the Medical Informatics Initiative (MII) and the development of the Data Integration Centers (DIC), clinical health-care data from various sources of the Hospital Information System (HIS) are made available for medical research. This creates a unique and rich repository of clinical data that is precisely defined across all participating sites. With the methodical Use Case Phenotyping Pipeline, PheP for short, the SMITH Consortium supports the construction, qualitative enrichment and evaluation of the data stock. The University of Leipzig is in charge of the project.
PheP is a platform that enables clinical researchers to work together with statisticians and computer scientists in interdisciplinary collaboration to pursue scientific issues that previously seemed economically and technologically unthinkable. For this purpose, it is necessary to build data sets that can be used for clinical-epidemiological and health-economic issues.
From phenotypes, i.e. determinable characteristics of patients, further characteristics can be derived and provided via phenotyping. PheP also supports the record linkage procedure, which is used to combine data on a patient from different information sources, for example from health insurance companies or death data from civil registers.
One of the challenges in this context is that too little clinical information is available as machine-readable data sets. Admission letters, findings and operating room reports in particular contain valuable information such as diagnoses, medications, side effects and laboratory data that can only be extracted using methods of natural language processing and semantic text analysis methods. Natural Language Processing (NLP) is used to process documents from the Hospital Information System (HIS). The process is academically led by the Jena University Language & Information Engineering Lab (JULIE Lab) in collaboration with leading companies in the field of language processing.
PheP focuses on supporting the development and standardized introduction of new Data Use Projects (DUPs). DUPs serve a variety of tasks – quality assurance in the context of health care, networking with external data, dynamic enrichment of the data pool, scientific hypothesis generation or statistical analysis of medical issues. We call the bundling of these processes PheP Factory.
The technical basis is provided by a platform built at all sites – the PheP Engine. The secure technology enables the execution of distributed analyses on the semantically and technically standardized data at all sites. Sensitive patient data remains in the clinic – the algorithms come to the data. This technology allows a flexible and data protection compliant approach for different clinical issues.
The PheP Concept also forms the basis of the cross-MII Use Case POLAR (Polypharmacy, Drug Interactions and Risks), which was launched in early 2020 and involves all four consortia of the MII.
“SMITH focuses on the current challenges of digitization. Through the sustainable use of care data in medical research, decisive steps are taken to improve diagnosis, prevention and therapy.
With these steps health care can be taken to a new level”.
Prof. Dr. Markus Löffler
Head of the SMITH Consortium
Head of the PheP Use Case
Institute for Medical Informatics, Statistics and Epidemiology (IMISE) | Leipzig University