First qualification course on Research Data Management Plans

The qualification activities of the BioDATEN SDC started with a course on the concept and the creation of Data Management Plans: "Introduction to Research Data Management – Data Management Plans". The course held at the Computer Center in Freiburg University was jointly organized by the Freiburg Research Data Management Group and the Science Data Center BioDATEN. The course gave a brief overview of the handling of research data and different aspects of the creation of research data management plans over the complete life cycle of the respective data. The course presented concepts and strategies for the creation of DMPs in research projects to ensure the long-term re-usability and accessibility of electronic research data, including functional issues, documentation and appropriate enrichment with metadata. Furthermore, the areas of planning archiving strategies, issues on sensitive data and, last but not least, cost and refinancing models were discussed. Recommendations for Data Management Plans were given on the structure, necessary elements ranging from standard project metadata to considerations on amount of data sets, file types and software involved. Further relevant points were data and software licensing and special considerations were dedicated on data citation which might become part of future credits for research.

The course was jointly organized by the Science Data Center BioDATEN, the Freiburg Research Data Management Group represented by the eScience groups of the Computer Center and the university library. Another workshop is planned due to the high demand, as well as a broadening of the range of training courses for subject-specific questions, as is increasingly the case due to the requirements of the various research groups in the bioinformatics community.

BioDATEN present at the Research Data Management Working Group at the Stuttgart university library

The Science Data Centre BioDATEN was present at the Research Data Management Working Group (AK FDM) meeting at the Stuttgart university library to discuss the challenges of long-term identifiers. Research data management (RDM) needs to deal with objects which outlive the typical period of a project and the persons working on a research question. The identification of people needs to consider the dynamics of research, the fluctuation of researchers and ongoing technological change. These challenges need to be considered in the context of data repositories and the Science Data Centres (SDC). The person identifiers are required for information discovery on research, scholarly information and to credit scientists for the results of their research, ideally both for publication of articles and data.

The referencing of scholarly publication and sharing of data sets depends on long-term durability of identifiers. The short-term affiliation of a person with a university or a research institution contrasts this endeavour if the identity management rests upon local system or the email addresses. Thus, there is the need for stable and unambiguous references for people. Objects like papers, research data, lab notebooks and source code should be linked to one or more researchers in a proper and stable way. Ideally, the references are machine actionable. Names are problematic as they are not unique, the spelling might be incorrect and bibliographical references might vary. Researchers move around, graduate, start a PhD at a different institution or change names. Additional identifiers like email, name of the institution and department are not necessarily stable as well. The institutions’ identity management systems disambiguate but their core focus is on active members of the institution and identities are getting disabled or deleted if a person retires or moves on.

The BioDATEN project addresses the topic in the work package "1.2.4 Persistent Identifier". As the challenge is around for a while and is faced by a wide range of institutions and publishers, a couple of options are available. The "Gemeinsame Normdatei" (GND, Integrated Authority File) is such an approach for German speaking countries to a unique identifier that facilitates the collaborative use and administration of authority data. The GND represents and describes entities, i.e. persons, corporate bodies, conferences and events, geographic entities, topics as well as work relating to cultural and academic collections. The GND is primarily used by libraries to catalogue publications, but other institutions like archives, museums, cultural and (academic) institutions use GND as well. Publishers employ for example the ResearcherID which is part of the "Web of Science". It provides identifiers - the Web of Science ResearcherIDs - which are used by institutions and funders as a persistent identifier to track research output and to update publication records in Web of Science, ensuring correct author attribution and disambiguation. Another commercial approach is the Scopus Author Identifier, a proprietary solution provided by Elsevier. For identity federations, the edu-ID plays a role. Activities concerning edu-ID in Germany are coordinated by a ZKI working group. It aims for a livelong, user-centric digital identity for research and education. There are different implementations in various federations: SUNET provides in Sweden for application and matriculation processes in higher education as well as for the creation of university user IDs. The Swiss edu-ID is the common digital identity for the academic sector.

ORCID, a novel approach to user-centric global person identifiers, started in 2010/11 by a Mellon grant and several other sponsors. It transcends discipline, geographic, national and institutional boundaries. Participation in ORCID is open to any organization that has an interest in scholarly communications. The access to ORCID services is based on transparent and non-discriminatory terms. Institutions can become members for an annual service fee, e.g. the University of Freiburg joined ORCID in February 2018. ORCID is governed by representatives from a broad cross-section of stakeholders which are by the majority not-for-profit. All software developed by ORCID will be released under a license approved by the Open Source Initiative. Researchers can create, edit, and maintain an ORCID ID and profile free of charge. Researchers control the defined privacy settings of their own ORCID. All profile data contributed to ORCID by researchers or claimed by them will be available in standard formats for free download. The availability of data is subject to the researchers' own privacy settings. ORCID identifiers and profile data are made available via free APIs and services. ORCID allows for identity federation account linking, e.g. to use Shibboleth federations. This requires the respective institution to allow ORCID as a Service Provider. ORCID itself can already act as an Attribute Provider based on OAuth2. To use this in services provided by the institution application integration is necessary. This will be discussed and evaluated for the envisioned BioDATEN services.

ORCID is already “GDPR certified”. This certification is the result of a long process initiated by the German chapter. This and its openness and independence make ORCID a prime candidate for long term person identification for BioDATEN research data sets. There are some precautious considerations: Most users are self-registered and may not be using their institution's email. There is no standardized endorsement process as an ID can be linked to institutional frameworks, but this is not required. An institutional account is difficult to distinguish from private ones. The organisation and institutional IDs are still a challenge to be unified globally. Finally, ORCID users should be aware that using ORCID has some implications: Using the ORCID API requires customising of software components. A persistent lifelong identifier allows tracking users for ever. ORCID is a US based, not-for-profit organisation, which is as such not bound to the Safe Harbour treaty. Nevertheless, it is currently the most open and viable solution available and it seems as if ORCID will become a worldwide standard. BioDATEN will strive for a coordinated solution within the State of Baden-Württemberg and opt for ORCID as the most wide spread and versatile person identifier. This could imply to add ORCID to IDM systems, provide coordinated endorsements and exchange ORCID via bwIDM federation (as an attribute).

Start of Teaching Series on Research Data Management in Bioinformatics - Workshop on Data Management Plans in Freiburg

Research processes produce an increasing amount of digital data. They are often very discipline-specific and exist in different forms. They can be the basis as well as the result of research. Preserving, managing and curating research data thus becomes a central task for every scientist and research institution - from the preparation of a research proposal to everyday research work. This process must be structured and organised. An increasingly established solution is the use of data management plans (DMP). They can primarily be understood as an abstract concept that helps to define data management through the planned course of the research project and its subsequent long-term availability.

A DMP structures the handling of research data over their life cycle. In the process, findings on required or generated data sets are to be considered as well as their licensing, enrichment with metadata, necessary processing steps and software, or ownership over time. The aim of the event is to explore the manifold questions surrounding data management and to enable the participants to create such a plan themselves. The course will cover the following topics:

  • Introduction to research data management
  • Presentation of the individual components of a data management plan: Collection of project metadata, description of the data genesis or data stock, data management, consolidation and archiving, exchange and standardization
  • Development and design of a data management plan
  • Digital data management in the research proposal
  • Presentation of online help tools (e.g. RDMO) and example DMPs (BMBF, DFG)
  • Institutional support

The course is part of the university's professional qualification programme. Application is possible via the Campus Management System.

Sponsored by: