Data management

Data management is all activities related to the management of research data throughout its lifecycle. Well-managed data management should result in FAIR data, easier replicability of research and verifiability of results. Good quality, transparent and well-performed data management enhances the credibility of science, leads to greater citation rates and generally stimulates further knowledge development. The project output of this process is usually a so-called data management plan and well managed data.

Data Management Plans (DMPs) are a key element of good data management.”

| European Commission, 2016.

Data management plan – DMP

DMP is a document that describes the process of data management in a specific research project – i.e. what data will be used and how they will be used or recreated during the research, and contains information about their parameters, availability, steps leading to their acquisition, their storage during the research, possibilities and limitations of their further use, and also information about long-term archiving.

The DMP is a “living” document to be updated and kept current throughout the research project. Therefore, it is advisable to use tools to prepare it that allow keeping up-to-date information on the handling of data in the project, and to generate the DMP as an on-demand document in an up-to-date form whenever needed.

More information about data management plan

The DMP is intended to be a living document and not a mere bureaucratic requirement, and should be updated as soon as significant changes occur during the project (e.g. new data, change in consortium terms or composition, etc.) and at the time of the project closure. The DMP is part of the research project methodology. If its well prepared, it makes work easier and saves time and is therefore a key support tool in the planning and implementation of a research project.

Schema - data management plan

The form of the document is not formally defined, but it is basically a set of topics and answers that vary according to the funder, the established practice of individual disciplines, the nature of the specific research project and, last but not least, the type of data included.

Science Europe in its “Practical Guide To The International Alignment of Research Data Management” recommends not to overlook 6 main topics (Description of data collection or use of existing data, Documentation and data quality, Storage and backup during research, Legal and ethical requirements, Data sharing and long-term preservation, Responsibility for the DMP, Related resources) and to answer 15 guiding, clarifying questions when developing a DMP. A translation of the Data Management Plan Template for Horizon Europe (bilingual version CZ and ENG) has been published by the National Technical Library for the purpose of the OP JAK grant calls.

logo znak archiv

Practical information and recommendations for a data management plan for social sciences can be found on the website of the Czech Social Science Data Archive (CSDA) – Institute of Sociology of the CAS. For the needs of Czech social scientists, CSDA has prepared its own DMP template in Czech language, which is largely based on the CESSDA ERIC template (Consortium of European Social Science Data Archives, European Research Infrastructure Consortium) . The template contains a few modifications that CSDA has made based on its own experience with the Czech environment. This templates also covers the DMP categories required by Horizon 2020.

Tools for creating DMPs

There are several suitable online tools for creating and maintaining DMPs. These are the most widely used solutions, which are more or less equivalent to each other and are all being further developed.

Specification of most common tools for DMP creation

ARGOS

Tool developed as open source within the OpenAIRE and EUDAT project. This online tool allows easy creation of DMPs and a separate description of datasets. The tool is provided as a free service. It is possible to use templates for data description. The tool is linked to ORCID, OpenAIRE services and the Zenodo repository, through which the DMP can be easily published and a DOI can be obtained for it.

DMP Online

The tool is developed as open source by the UK DCC – Data Curation Centre – and contains a large number of public DMPs and data description templates. It is provided as a free service. It features the ability to easily publish DMPs directly on the DMP online site.

Data Stewardship Wizard

A comprehensive tool for DMP creation and research data management developed within the ELIXIR infrastructure as open source – it is a joint project of ELIXIR CZ and ELIXIR NL. It allows easy creation of custom pre-populated templates, for example for research teams or research tools used, thus facilitating the processing of other DMPs. The tool also includes more complex knowledge models profiled by scientific fields, automatically evaluates the degree of fulfilment of FAIR principles, is well linked to the fairsharing.org standards database and encourages users to make the most of controlled vocabularies – this in turn allows for comparison and evaluation of already created DMPs. It is free for ELIXIR-affiliated institutions, otherwise it is provided as a paid service. Another option is to run the instance on your own server.

ELIXIR CZ – affiliated institutions of the CAS:

Other data management tools

There are a number of different technical tools that can be useful support for each part of the data lifecycle.

Specification of few other useful tools

FAIR self-assessment tool

The tool allows you to use simple questions to assess the extent to which your data is FAIR.

AMNESIA

OpenAIRE’s research data anonymisation tool allows sensitive data to be redacted so that it can be shared with the general public, thus eliminating the risk of violating the rights of others. The tool can deduplicate uploaded anonymised data when deposited in the ZENODO repository

OSF

The Open Science Framework is a platform for open and user-friendly management of the entire research process at all stages – from initial conception, through processing, editing, analysis and final sharing of publication results and data. It allows registration of a research plan, project management or data management. The environment is connected to many other services and platforms – ORCID, Zotero, Mendeley, Google scholar, Dropbox, etc.

Jupyterlab

JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality