1. Integration of healthcare and clinical research
  2. Improvement of individual patient care through innovative information technologies and new opportunities for data handling using interoperability standards
  3. Promotion of a new culture of data sharing
  4. Enabling patients to actively participate in healthcare and research
  5. Adaption of existing curricula to the new challenges (M.Sc. medical informatics/medical data science and in the postgraduate field)

These objectives will be achieved by five components:


SMITH will establish closely cooperating data integration centers (DIC) at all participating university hospitals. Tasks will include the extraction, curation and standardized storage of relevant data from existing clinical IT systems and functioning as the data broker and trustee that prepares and organizes data use. Further tasks are the use of findings to improve healthcare and allowing clinical and research staff access to the data. For the implementation of these tasks, the DICs will keep electronic health records and synchronizable metadata repositories to endure semantic interoperability. The DIC’s goals also includes developing an IT architecture for extraction, structuring, storage and transfer of data. In the future, structured documentation and data exchange will increasingly start directly at the clinical IT source systems.

Furthermore, a substantial component of the structure will be the implementation of patient participation through information about on-site and cross-institutional research projects as well as through the possibility of giving consent to provide individual-specific data.

In the consortium, technical and semantic interoperability will be established in cooperation with non-university research and business partners. These partners are willing to contribute considerable resources to help SMITH succeed.

A second component of our concept is the establishment of a phenotyping pipeline. The objective is to allow clinicians and researchers to specify analyses of patient data. Therefore it is necessary to algorithmize these requests and to build data records refined by annotation machines which can be used for clinical, epidemiological and medical economics questions. To achieve this we use semantic text analysis methods to process readable documents (admission notes, diagnostic results, medications, billing data) from the hospital information systems and extract diagnoses, findings, medications, adverse reactions, lab data etc. In the concept phase, SMITH has already built a cross-site body of text from three thousand patient-related documents for training purposes.

In a second step, specific phenotypes will be developed and classified. This will gradually lead to a large curated data space of structured information about patients that can be used to optimize patient care and improve research. These data can then be enriched, for example by adding organizational information on any biobanking samples that may be present, with inclusion options for clinical trials, or with additional, non-routine data collections related to the patient.

On the basis of the phenotyping pipeline, two clinical use cases will also be pursued. In the use cases we want to demonstrate the performance of the installed DIC IT infrastructure. Both use cases are custom-made to the internationally recognized medical expertise at the sites.


The first use case concerns “Antibiotic Stewardship” in infectious medicine. The focus is targeted, guideline-compliant application of antibiotics to combat bacterial infections, taking into account the limited number of trained infectious disease specialists in Germany. The use case will be implemented in normal and intensive care units. The aim is to use IT to optimize councils where patient care is discussed. This will lead to more effective antibiotic use in patients with bacterial infections. In addition to a direct improvement in patient care, this also indirectly helps prevent the development of multi-resistant bacteria.

The second use case is limited to intensive care units. By means of continuous analyses from the patient data management system (PMDS), a model-based “algorithmic surveillance” of critically ill patients will be established. This system serves as an alert to initiate faster diagnostic and therapeutic interventions. The data will be analyzed with high performance computing producing annotated results for clinical decision making.

Both use cases will be implemented multiple sites followed by implementation in additional health-care facilities in a later phase.

As a fifth component, we have developed a training and continuing education program that provides similar, modular curricula in the field of M.Sc. medical informatics / medical data science and in postgraduate education. It is also planned to help medical students and physicians prepare for a future with large, high-quality data sets, which are subject to strict data/privacy protection requirements.