MOSAIC Project

The MOSAIC project was funded by the German Research Foundation (funding number: HO 1937/2-1) and was carried out from 2012 to 2015 at the Institute for Community Medicine (Section Epidemiology of Health Care and Community Health) of University Medicine Greifswald.

The aim of this project was to simplify the implementation of a central data management system while focussing on epidemiological research and data protection.
In 2019 mosaic-greifswald.de was integrated into the new web presence ths-greifswald.de. In individual cases, content was transferred to corresponding documents. E-PIX®, gPAS® and gICS® are centrally provided for use via ths-greifswald.de. You will find the other tools in the following overview.

Der wiss. Beirat des MOSAIC-Projektes (2014)

The scientific advisory board of the MOSAIC project (2014)

The scientific advisory board, consisting of Prof. Dr. Gefeller (FAU Erlangen), Prof. Dr. Dr. Leitzmann (UK Regensburg), Univ.-Prof. Dr. Neugebauer (University Witten/Herdecke) and Prof. Dr. Sax (UMG Göttingen), accompanied and advised the work within the MOSAIC project.

Publications and lectures

Publication

MOSAIC. A modular approach to data management in epidemiological studies. (2015)

Publication

A workflow-driven approach to integrate generic software modules in a Trusted Third Party (2015)

Poster

Zentrales Datenmanagement (DGEpi 2013) [available in German only]

Poster

Praktische Hilfestellung durch Vorlagen, Leitfäden und Empfehlungen (DKVF 2015) [available in German only]

Poster

Kostenfreie Werkzeuge für die epidemiologische Forschung (DKVF 2015) [available in German only]

Poster

E-PIX – Who am I and If so how many (DFG 2014) [available in German only]

Poster

Generische Pseudonymisierung mit gPAS (DGEpi 2013) [available in German only]

Lecture

MOSAIC: praxis-orientierte Unterstützung für Kohortenstudien und Register (DKVF 2015) [available in German only]

The Trusted Third Party Tools

The Trusted Third Party tools were further developed by the University Medicine Greifswald even after the end of the MOSAIC project and are now available for the research community via different portals. The latest version of the software, documentation as well as demos can be found on the respective product pages.

Our tool for record linkage and identity management.

Open

Our tool for flexible pseudonym management.

Open

Our tool for comprehensive consent management.

Open

GitHub

At GitHub, we make the source code of our tools available for research.

Open

Docker

With Docker you can install our tools yourself using only a few clicks.

Open

TMF ToolPool

In the TMF ToolPool Gesundheitsforschung you will find descriptions, links and scientific references.

Open

plan.Tau – an interactive reference portal for solutions in central data management

The question-answer-system plan.Tau is a knowledge database, which supports planning and conception of a central data management for epidemiological research. The goal was to provide researchers and IT experts with a common thread for the creation of a central data management system, using phases of cohort studies as an example. Targeted questions should point out typical problems. At the same time, the respective answers offered possible solutions and references to common literature. The resulting web portal was available for the scientific community until June 2019 at https://mosaic-greifswald.de/qa/ and was presented at the DGEpi in Ulm in 2014.

plan.Tau Poster

This poster was a contribution to the DGEpi 2014, presented by Martin Bialke [available in German only]

plan.Tau Q&A Katalog

Catalogue of all questions and answers of the plan.TAU reference portal (as of June 2019) [available in German only]

Data backup template

Data protection measures are essential for any research project and should be taken during the preparatory phase (in any case, before the start of data collection). The set of slides provided serves as a short introduction and sensitisation to the topic of backup and recovery strategies. It offers a simple overview of influencing factors and decision criteria for concrete further planning.
Based on experience and additional research, the MOSAIC project developed a sample template for the simple creation of a data backup and recovery plan. This document is a tool for planning and communicating your backup strategy by helping you to identify requirements and define adequate measures together with your IT contact persons. It also serves as a support for action in case of damage by documenting contact persons, backup artifacts and recovery steps. The completed plan should be made available to all parties involved and stored securely (at various locations).

Template

Sample template for easy creation of a backup and recovery plan [available in German only]

Thematic introduction

Thematic introduction to the creation of a data backup and recovery plan [available in German only]

Template for data protection concept

The template provides a prepared document structure for writing a data protection concept for (multicentre) studies and registries. Notes and examples inform the author about the meaning of the respective sections. Targeted questions draw attention to necessary considerations, decisions and potential solutions.

The aim of the template is to provide the author of the data protection concept with a common thread for creating a data protection concept and to provide the necessary form for this.

Template

Sample template for the simple creation of a data protection concept [available in German only]

Guideline for describing a data dictionary

The definition of a data dictionary must be carefully carried out and coordinated, as it is the basis and starting point for all subsequent steps in the course of a study or registry. If changes to the study data set or the data dictionary become necessary after the start of the study/registry, this will have considerable organisational and temporal effects, the cost of which is regularly underestimated.
The aim of the guideline is to provide epidemiologists and scientists with the most concise and precise support possible in the creation of a data dictionary. To this end, it lists aspects to be taken into account, and at the same time provides numerous recommendations on how to proceed from a practical perspective. This guide is available in German and English. Topics are among others:

Preconditions
Variable names and characteristics
Typical data types and value ranges
Validity and dependencies
Coding of valid values and missings
Recommendations from practice
Furthter need for coordination
Templates and examples

Guideline

Guideline for Describing a Data Dictionary (English)

Guideline on designing eCRF

Epidemiological researchers without comprehensive IT knowledge sometimes use everyday-software tools (e.g. MS Excel) for data collection in studies and registries. Technical challenges, such as using a central system for electronic data capture (EDC) and the creation of corresponding web forms (electronic Case Report Form, eCRF), present real obstacles.
Based on the description of the Data Dictionary, the aim of this guideline is to provide support in the design of an eCRF. References to relevant literature and examples, as well as recommendations from practice, are intended to contribute to a better understanding and help to minimise research effort. Topics are among others:

Preconditions
Phrasing and structuring questions
Tips for determining input elements
Choosing an eCRF solution
Defining the form using the example of OpenClinica
Tips for generating a questionnaire
Recommendations from practice
Related literature
Directly usable sample eCRFs

Guideline

Guideline on designing eCRF (English)

Library in R for basic data quality assurance

Every epidemiological research project that carries out data collection faces the challenge of continuously checking the quality of the data.

In order to be able to apply basic procedures for plausibility checks of data without knowledge of units, value ranges and codings of the corresponding variables, the focus of the provided R-library is on the generally valid generation of reports. In this way, statements can be made about the distribution of frequencies, the completeness and existing extreme values of the data.

The goal of the MOQA library is to visualize the quality of the data as generically as possible for each variable using R. This is done mainly by analysing the frequency of valid values and missings, the distribution of data and the distinction between categorical and metric data. This allows to generate general reports and to derive corresponding statements. For more concrete statements, knowledge about metadata (e.g. variable description, unit) and coding (e.g. valid answers, missings) of the variables is required.

Included sample scripts for metric and categorical data (in CSV format or as dataframe) give the possibility to generate reports for single or multiple variables. The 2017 publication (at that time the R-package was named “mosaicQA”) summarises the background and scope of the library.

Publication

Publication on MOQA (2017)

CRAN

The R library MOQA in the official CRAN repository

Supplementary material

Details on installing and using MOQA [available in German only]

Toolbox for Research

Limited resources in terms of budget, personnel and IT infrastructure are a common feature of epidemiology and health services research. Especially smaller registries and cohort studies often lack staff with programming skills. For this reason, such studies often use supposedly simple administration procedures for data and participants instead of IT-supported data management including study databases.

Within the MOSAIC project, a flexible software solution for data management in smaller research projects was provided free of charge. This “Toolbox for Research” (short: Toolbox) is suitable for a variety of application scenarios and supports the collection, processing and storage of research data across different locations. The automatic installation of this open-source Toolbox was simplified considerably by the use of Docker and extensive accompanying documentation. Since in smaller research projects an automated separation of MDAT and IDAT is difficult to implement without support of a Trusted Third Party, no person-identifying data should be stored within the Toolbox. In order to be able to map the medical research data within the Toolbox to person-identifying data of the clinical context in a comprehensible way, the Toolbox uses a uniform and, if necessary, transparent pseudonymisation concept.

In the publication “Toolbox for Research, or how to facilitate a central data management in small-scale research projects“, the background and technical approach of the Toolbox are explained. In addition, the Open Access publication presents initial experiences and results from the Toolbox’s pilot project in the German Burn Registry . Click here to read the publication: http://rdcu.be/FynH.

Publications and lectures

Publication

Publication

Poster

Poster

Poster

Poster

Poster

Lecture

The Trusted Third Party Tools

GitHub

Docker

TMF ToolPool

plan.Tau – an interactive reference portal for solutions in central data management

plan.Tau Poster

plan.Tau Q&A Katalog

Data backup template

Template

Thematic introduction

Template for data protection concept

Template

Guideline for describing a data dictionary

Guideline

Guideline on designing eCRF

Guideline

Library in R for basic data quality assurance

Publication

CRAN

Supplementary material

Toolbox for Research

Publication

Poster

Software and documentation

Supplementary material

Info