Project Description

The MOSAIC project was funded by the German Research Foundation (funding number: HO 1937/2-1) and was carried out from 2012 to 2015 at the Institute for Community Medicine (Section Epidemiology of Health Care and Community Health) of University Medicine Greifswald.

The aim of this project was to simplify the implementation of a central data management system while focussing on epidemiological research and data protection.
In 2019 was integrated into the new web presence In individual cases, content was transferred to corresponding documents. E-PIX®, gPAS® and gICS® are centrally provided for use via You will find the other tools in the following overview.

Der wiss. Beirat des MOSAIC-Projektes (2014)

The scientific advisory board of the MOSAIC project (2014)

The scientific advisory board, consisting of Prof. Dr. Gefeller (FAU Erlangen), Prof. Dr. Dr. Leitzmann (UK Regensburg), Univ.-Prof. Dr. Neugebauer (University Witten/Herdecke) and Prof. Dr. Sax (UMG Göttingen), accompanied and advised the work within the MOSAIC project.

Publications and lectures

The Trusted Third Party Tools

The Trusted Third Party tools were further developed by the University Medicine Greifswald even after the end of the MOSAIC project and are now available for the research community via different portals. The latest version of the software, documentation as well as demos can be found on the respective product pages.

plan.Tau – an interactive reference portal for solutions in central data management

The question-answer-system plan.Tau is a knowledge database, which supports planning and conception of a central data management for epidemiological research. The goal was to provide researchers and IT experts with a common thread for the creation of a central data management system, using phases of cohort studies as an example. Targeted questions should point out typical problems. At the same time, the respective answers offered possible solutions and references to common literature. The resulting web portal was available for the scientific community until June 2019 at and was presented at the DGEpi in Ulm in 2014.

Data backup template

Data protection measures are essential for any research project and should be taken during the preparatory phase (in any case, before the start of data collection). The set of slides provided serves as a short introduction and sensitisation to the topic of backup and recovery strategies. It offers a simple overview of influencing factors and decision criteria for concrete further planning.
Based on experience and additional research, the MOSAIC project developed a sample template for the simple creation of a data backup and recovery plan. This document is a tool for planning and communicating your backup strategy by helping you to identify requirements and define adequate measures together with your IT contact persons. It also serves as a support for action in case of damage by documenting contact persons, backup artifacts and recovery steps. The completed plan should be made available to all parties involved and stored securely (at various locations).

Template for data protection concept

The template provides a prepared document structure for writing a data protection concept for (multicentre) studies and registries. Notes and examples inform the author about the meaning of the respective sections. Targeted questions draw attention to necessary considerations, decisions and potential solutions.

The aim of the template is to provide the author of the data protection concept with a common thread for creating a data protection concept and to provide the necessary form for this.

Guideline for describing a data dictionary

The definition of a data dictionary must be carefully carried out and coordinated, as it is the basis and starting point for all subsequent steps in the course of a study or registry. If changes to the study data set or the data dictionary become necessary after the start of the study/registry, this will have considerable organisational and temporal effects, the cost of which is regularly underestimated.
The aim of the guideline is to provide epidemiologists and scientists with the most concise and precise support possible in the creation of a data dictionary. To this end, it lists aspects to be taken into account, and at the same time provides numerous recommendations on how to proceed from a practical perspective. This guide is available in German and English. Topics are among others:

  • Preconditions
  • Variable names and characteristics
  • Typical data types and value ranges
  • Validity and dependencies
  • Coding of valid values and missings
  • Recommendations from practice
  • Furthter need for coordination
  • Templates and examples

Guideline on designing eCRF

Epidemiological researchers without comprehensive IT knowledge sometimes use everyday-software tools (e.g. MS Excel) for data collection in studies and registries. Technical challenges, such as using a central system for electronic data capture (EDC) and the creation of corresponding web forms (electronic Case Report Form, eCRF), present real obstacles.
Based on the description of the Data Dictionary, the aim of this guideline is to provide support in the design of an eCRF. References to relevant literature and examples, as well as recommendations from practice, are intended to contribute to a better understanding and help to minimise research effort. Topics are among others:

  • Preconditions
  • Phrasing and structuring questions
  • Tips for determining input elements
  • Choosing an eCRF solution
  • Defining the form using the example of OpenClinica
  • Tips for generating a questionnaire
  • Recommendations from practice
  • Related literature
  • Directly usable sample eCRFs

Library in R for basic data quality assurance

Every epidemiological research project that carries out data collection faces the challenge of continuously checking the quality of the data.

In order to be able to apply basic procedures for plausibility checks of data without knowledge of units, value ranges and codings of the corresponding variables, the focus of the provided R-library is on the generally valid generation of reports. In this way, statements can be made about the distribution of frequencies, the completeness and existing extreme values of the data.

The goal of the MOQA library is to visualize the quality of the data as generically as possible for each variable using R. This is done mainly by analysing the frequency of valid values and missings, the distribution of data and the distinction between categorical and metric data. This allows to generate general reports and to derive corresponding statements. For more concrete statements, knowledge about metadata (e.g. variable description, unit) and coding (e.g. valid answers, missings) of the variables is required.

Included sample scripts for metric and categorical data (in CSV format or as dataframe) give the possibility to generate reports for single or multiple variables. The 2017 publication (at that time the R-package was named “mosaicQA”) summarises the background and scope of the library.

Toolbox for Research

Limited resources in terms of budget, personnel and IT infrastructure are a common feature of epidemiology and health services research. Especially smaller registries and cohort studies often lack staff with programming skills. For this reason, such studies often use supposedly simple administration procedures for data and participants instead of IT-supported data management including study databases.

Within the MOSAIC project, a flexible software solution for data management in smaller research projects was provided free of charge. This “Toolbox for Research” (short: Toolbox) is suitable for a variety of application scenarios and supports the collection, processing and storage of research data across different locations. The automatic installation of this open-source Toolbox was simplified considerably by the use of Docker and extensive accompanying documentation. Since in smaller research projects an automated separation of MDAT and IDAT is difficult to implement without support of a Trusted Third Party, no person-identifying data should be stored within the Toolbox. In order to be able to map the medical research data within the Toolbox to person-identifying data of the clinical context in a comprehensible way, the Toolbox uses a uniform and, if necessary, transparent pseudonymisation concept.

In the publication “Toolbox for Research, or how to facilitate a central data management in small-scale research projects“, the background and technical approach of the Toolbox are explained. In addition, the Open Access publication presents initial experiences and results from the Toolbox’s pilot project in the German Burn Registry . Click here to read the publication:

In case of questions or suggestions concerning the MOSAIC project you can contact us here.