1. Strengthen the integration of healthcare and clinical research
  2. Improvement of individual patient care through innovative information technologies and new opportunities for data handling on the basis of process and interoperability standards
  3. Promotion of a new culture of data sharing
  4. Enabling of patients to actively participate in healthcare and research
  5. Adaption of existing curricula to the new challenges

These objectives will be achieved by five components:


 SMITH will establish closely cooperating data integration centers (DIC) at all three sites. Their tasks, among other things, will be the extraction, curation and standardized storage of relevant data from existing clinical IT systems, the preparation, consenting and organization of data use and access as data broker and trustee. Further tasks are the reintegration of findings into healthcare and the access to these data for clinical and research staff. For the implementation of these tasks, the DICs will keep electronic health records and synchronizable metadata repositories to endure semantic interoperability. The development of the DICs also includes the provision of an IT architecture for extraction, structuring, storage and transfer of data. In the future, structured documentation and data exchange will increasingly start directly at the clinical IT source systems.

Furthermore, a substantial component of the structure will be the implementation of an active participation of the patient through information of on-site and cross-institutional research projects as well as through the possibility of project-wise consenting to provide data use and access.

In the consortium, technical and semantic interoperability will be established through a close cooperation with extra-university research facilities and business companies. These are willing to contribute considerable resources for the success of SMITH.

A second component of our concept is the establishment of a phenotyping pipeline. The task is to support the analysis of patient data based on selected requests specified by clinicians and researchers. Therefore it is necessary to algorithmize these requests and to build data records “refined” by annotation machines which can be used for clinical, epidemiological and health-economical questions. To this end, we will prepare readable documents (discharge letters, diagnostic results, medications, billing data) from the hospital information systems and extract diagnoses, findings, medications, adverse reactions, lab data etc. from these using methods of natural language processing and semantic text analyses. In concept phase, SMITH has already built a cross-site text corpus of 3.000 patient-related documents for training purposes.
In a second step, specific phenotypes will be developed and classified. The result is a large cured data space of structured information about patients that can be used for purposes of care optimization and clinical research. These data can then be enriched -e.g. with regard to organizational information on any biobanking samples that may be present, inclusions and inclusion options for clinical trials, or additional, non-routine data collections related to the patient.

On the basis of the phenotyping pipeline, two clinical use cases will also be pursued, with which we want to demonstrate the performance of the IT infrastructure to be installed at the DICs. Both use cases are tailored to the internationally recognized medical expertise at the three sites.


The first use case deals with the topic “Antibiotic Stewardship” in infectious medicine. The focus is on targeted, guideline-compliant application of antibiotics to combat bacterial infections taking into account the limited number of trained infectologists in Germany. The use case will be implemented at normal and intensive care units. The aim is to optimize the use of infectious diseases consultations using IT to use antibiotics in patients with bacterial infections more purposefully. In addition to a direct improvement in patient care, this is also an indirectly contributory to the prevention of multi-resistances.

The second use case is limited to intensive care units. By means of continuous analyses from the patient data management system (PMDS), a model-based “algorithmic surveillance” of the state of critically ill patients will established. This system serves as a basis for an early alerting to initiate faster diagnostic and therapeutic interventions. Data analyses requires high performance computing and results will be annotated for clinical decision making.
Both use cases will be implemented at all three sites. In a later phase, they will be rolled out to other health-care providers.
As a fifth component, we have developed a training and continuing education programs that provide similar, modular curricula at the three site in the field of M.Sc. medical informatics / medical data science and postgradual education. In addition, there are also offers for medical students and physicians to prepare them for a future with large, high-quality data sets, which are subject to strict data/privacy protection requirements.